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Abstract 

A  fully  recurrent  neural  network  was  applied  to  the  function  prediction  probicui. 
rhe  real-time  recurrent  learning  (RTRL)  algorithm  was  modified  and  tested  for  use  as  a 
viable  function  predictor,  The  modification  gave  the  algorithm  a  variable  learning  rate  and 
a  linear/sigmoidal  output  selection,  Verifying  the  networks  ability  to  temporally  learn  both 
he  classic  excIusive-OR  (XOR)  problem  and  the  internal  state  problem,  the  network  was 
then  used  to  simulate  the  frequency  response  of  a  second  order  HR  lowpass  Butterworth 
niter.  The  recurrent  network  was  then  applied  to  two  problems;  head  position  tracking,  and 
voice  data  reconstruction.  The  accuracy  at  which  the  network  predicted  the  pilot's  head 
position  was  compared  to  the  best  linear  statistical  prediction  algorithm.  The  application 
of  the  network  to  the  reconstruction  of  voice  data  showed  the  recurrent  network’s  cbl’ity 
to  Icnrn  temporally  encoded  sequences,  and  make  decisions  os  to  whether  or  not  a  spcccli 
signal  sample  was  considered  a  fricative  or  a  voiced  portion  of  speech. 
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FUNCTION  PREDICTION 
USING  RECURRENT  NEURAL  NETWORKS 


/.  Introduction 

The  ability  of  machines  to  perform  accurate  function  prediction  remains  an  unsolved 
problem.  Although  conventional  sensors  used  in  military  applications  provide  enough  in¬ 
formation  for  a  human  to  predict  an  event’s  outcome,  the  extension  to  automatic  prediction 
by  machines  is  still  impractical  using  current  computer  architectures.  According  to  Webster 
(8),  to  predict  is  to 

declare  in  advance;  esp:  foretell  on  the  basis  of  observation,  experience,  or 

scientific  reason. 

Therefore,  function  prediction,  as  defined  in  this  thesis,  is  the  declaration  of  the  future 
value  of  a  specific  function  based  upon  that  function’s  history. 

The  use  of  recurrent  neural  network  theory  provides  a  novel  approach  to  solving  this 
problem.  Biological  neural  networks  readily  and  easily  process  temporal  information;  ar¬ 
tificial  neural  networks  should  do  the  same.  Formulated  from  biological  research,  artificial 
neural  networks  provide  a  unique  approach  to  solving  problems  that  could  prove  quite  suc¬ 
cessful  in  the  areas  of  speech  processing,  image  recognition,  and  function  prediction  (7). 
Recurrent  neural  networks  are  artificial  neural  networks  which  permit  the  encoding  and 
learning  of  temporal  sequences.  This  is  an  important  feature  in  a  world  governed  by  time 
dependent  processes.  Thus,  properly  trained  recurrent  neural  networks  could  prove  quite 
successful  in  applications  involving  time  dependencies,  including  function  prediction. 
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1.1  Problem 


The  goal  of  this  thesis  is  to  perform  accurate  function  prediction  using  recurrent 
neural  networks. 

1.2  Background 

Publications  in  the  field  of  neural  networks  span  a  multi-disciplinary  spectrum: 
neurobiology,  physics,  psychology,  medical  science,  mathematics,  computer  science,  and 
engineering.  As  such,  it  is  difficult  to  accurately  compile  a  thorough  summary  of  where 
neural  network  technology  stands  today.  However,  a  broad  sampling  of  current  literature 
centered  on  the  topic  of  recurrent  backpropagation  neural  networks  yields  a  more  focused 
review.  Chapter  II  contains  highlights  of  some  of  the  most  promising  recurrent  neural 
network  algorithms,  namely  backpropagation  through  time  (BPTT),  modified  BPTT,  real¬ 
time  recurrent  learning  (RTRL),  and  subgrouped  RTRL. 

As  technology  improves,  new  and  innovative  algorithms  are  discovered  which  help 
researchers  and  engineers  alike  in  solving  time-dependent  problems.  All  of  the  algorithms 
previously  discussed  possess  the  ability,  if  properly  trained,  to  tackle  many  difficult  tem¬ 
poral  tasks.  Several  of  these  algorithms  are  simply  modifications  of  the  backpropagation 
through  time  method,  or  the  real-time  recurrent  learning  method. 

1.3  Assumptions 

It  is  assumed  that  the  input  feature  vectors  have  already  been  selected  for  use  in 
training  and  testing  the  network. 

1.4  Scope 

The  scope  of  this  thesis  will  focus  on  solving  the  function  prediction  problem  using 
recurrent  neural  network  theory.  This  theory  is  based  on  a  modification  of  the  RTRL 
algorithm  (23). 
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1.5  Approach 

Function  prediction  using  recurrent  neural  networks  will  be  accomplished  in  four 
steps.  First,  the  recurrent  neural  network  program  must  be  created.  The  RTRL  algorithm 
will  be  coded  using  the  C  programming  language.  It  will  be  tested  using  several  temporally 
encoded  data  sets  to  verify  its  performance.  Second,  the  network’s  output  will  be  modified 
to  determine  if  linear  outputs  combined  with  sigmoidal  "hidden  units"  (processing  units 
which  have  no  external  connections)  will  further  optimize  the  network’s  response.  This 
modification  will  enable  the  network  to  predict  unbounded  functions.  Third,  a  variable 
learning  rate  will  be  added  to  the  network  training  algorithm  to  enhance  the  rate  of 
convergence.  Finally,  several  functions  will  be  used  to  test  the  network’s  prediction 
abilities,  including  two  specific  applications. 
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11.  Literature  Review 


2.1  Introduction 

In  this  literature  review,  the  current  state  of  recurrent  neural  network  technology  is 
summarized. 

Biological  neural  networks  readily  and  easily  process  temporal  information;  artificial 
neural  networks  should  do  the  same.  Formulated  from  biological  research,  artificial  neural 
networks  provide  a  heuristic  approach  to  solving  problems  that  could  prove  quite  successful 
in  the  areas  of  speech  processing  and  image  recognition  (7).  Recurrent  neural  networks 
are  artificial  neural  networks  which  use  feedback  to  encode  and  learn  temporal  sequences. 
This  is  an  important  feature  in  a  world  governed  by  time  dependent  processes.  Thus, 
properly  trained  recurrent  neural  networks  could  prove  quite  successful  in  applications 
involving  time  dependencies. 

This  section  contains  a  short  background  on  basic  neural  network  theory  to  aid 
the  reader’s  comprehension  of  that  subject.  In  addition,  the  following  algorithms  are 
highlighted:  backpropagation  through  time  (BPTT),  modified  BPTT,  real-time  recurrent 
learning  (RTRL),  and  subgrouped  RTRL.  These  algorithms  summarize  the  current  im¬ 
provements  in  recurrent  backpropagation  neural  network  technology. 

2.2  Background 

Artificial  neural  networks  are  nothing  more  than  an  application  of  biological  concepts 
to  electronic  machines.  Another  name  for  an  artificial  neural  network  is  a  neuromime.  It  is 
called  a  neuromime  because  it  attempts  to  copy  or  mimic  the  response  of  a  true  biological 
neuron,  the  most  basic  processing  element  of  the  brain  (13). 

During  the  late  1950’s,  Rosenblatt  invented  a  new  class  of  machines  which  seemed 
to  offer  what  many  researchers  thought  was  a  natural  and  powerful  model  of  machine 
learning  (15).  It  was  called  the  perceptron.  The  basic  perceptron  model  consists  of  an 
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array  of  input  sensory  nodes  randomly  connected  to  a  second  array  of  associative  nodes. 
The  random  connections  are  called  weights.  The  weights  are  randomly  generated  values 
in  the  range  [-1,1].  Each  of  the  secondary  nodes  produces  an  output  only  if  enough  of 
the  sensory  nodes  connected  to  it  are  activated.  The  sensory  nodes  can  be  viewed  as  the 
means  by  wliich  outside  information  is  captured  by  the  machine,  and  the  associative  nodes 
can  be  viewed  as  the  input  to  the  machine. 

The  output,  or  response,  of  the  perceptron  is  proportional  to  the  weighted  sum  of  the 
associative  node  responses.  In  other  words,  if  Xi  denotes  the  response  of  the  ith  associative 
node  and  w,  denotes  the  corresponding  connection  weight,  then  the  response  is  given  by 

n 

Rn  = 

i=l 

where  n  is  the  total  number  of  associative  nodes.  Thus  for  a  positive  R,  the  stimulus  is 
said  to  belong  to  class  1,  and  for  a  negative  R,  the  stimulus  is  said  to  belong  to  class  2. 
That  is  how  a  decision  is  made.  In  its  most  basic  form,  the  basic  perceptron  is  simply 
an  implementation  of  a  linear  decision  function.  The  perceptron  leams  by  changing  the 
connection  weights  in  such  a  way  as  to  minimize  the  total  response  error.  The  nodal  error 
is  the  difference  between  the  desired  output  and  the  actual  computed  response  of  the  node. 
In  equation  form,  the  error  is  given  by 

Rn 

where  e„  is  the  error  of  node  n,  and  d„  is  the  desired  value  of  node  n.  Therefore,  the  total 
response  error  is  the  summation  of  the  nodal  errors  over  the  entire  length  of  the  data  set 
(epoch). 

In  most  applications,  the  output  of  the  network  is  processed  by  a  differentiable 
function,  usually  the  logistic  squashing  function  (sigmoid).  The  output  of  the  sigmoid  is 
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Figure  1.  Single-layer  perceptron  with  sigmoidal  processing.  The  output  of  the  node  is 
the  weighted  sum  of  the  inputs,  processed  through  the  sigmoid  function.  The 
sigmoid  function  is  displayed  in  the  lower  p.‘’rt  of  the  figure. 

given  by 

/<*) = (!) 

When  the  network  input  is  processed  by  this  function,  the  response  is  the  weighted 
sum  of  the  inputs,  including  the  bias  term  6,  processed  tluough  the  sigmoid  function.  Thus 
the  resulting  output  is  given  by 


n 

Rn  = 

i=l 

Figure  1  details  the  output  of  the  sigmoid  and  gives  a  good  picture  of  what  the  network 
node  should  be  viewed  as. 

To  date,  many  other  architectures  have  been  proposed  which  extend  the  basic  con¬ 
cepts  introduced  by  Rosenblatt.  These  new  networks  are  called  by  various  names:  mul- 
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tilayer  perceptrons,  feedforward  neural  networks,  backpropagation  networks,  recurrent 
backpropagation,  and  so  on.  The  term  backpropagation  refers  to  the  way  interconnection 
weights  are  updated;  that  is  by  propagating  backward  from  the  output  to  the  input,  chang¬ 
ing  each  connection  weight  in  such  a  way  as  to  minimize  the  total  error.  A  recurrently 
connected  neural  network  is  a  backpropagation  network  that  contains  feedback  loops  from 
previous  states  (timed  inputs).  The  outputs  that  feedback  are  ysed  as  part  of  the  next 
sequentially  timed  input.  So,  the  output  at  time  t  -M  is  predictive  based  upon  the  current 
input  and  the  previous  output.  As  with  the  input  vector,  the  feedback  connections  each 
have  their  own  adaptable  weights.  These  recurrent  weights  are  changed  just  as  before  in 
order  to  minimize  the  total  error  over  the  epoch  length.  Figure  2  shows  a  general  layout 
of  a  recurrent  neural  network.  Notice  that  the  current  input  vector  at  time  t  is  composed  of 
a  bias  (always  equal  to  one),  the  external  inputs,  and  the  previous  network’s  output.  This 
is  a  convenient  way  to  show  how  feedback  is  processed  through  the  network. 

2.3  Scope 

Publications  in  the  field  of  neural  nrtworks  span  a  multi-disciplinary  spectrum; 
neurobiology,  physics,  psychology,  medical  science,  mathematics,  computer  science,  and 
engineering.  As  such,  it  is  difficult  to  accurately  compile  a  thorough  summary  of  where 
neural  network  technology  stands  today.  However,  a  broad  sampling  of  current  literature 
centered  on  the  topic  of  recurrent  backpropagation  neural  networks  yields  a  more  focused 
review. 

The  scope  of  this  review  will  to 'us  on  current  literature  detailing  the  improvements 
in  recurrent  backpropagation  neural  network  technology.  Most  of  the  improvements 
presented  are  simply  modified  versions  of  previously  published  work. 

2.4  Backpropagation  through  time  (BPTT) 

Much  of  the  current  research  has  focused  on  the  use  of  recurrent  neural  networks  that 
deal  with  time-varying  input  or  output  in  nontrivial  ways.  Rumelhart  describes  a  general 
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Figure  2.  A  general  recurrent  neural  network.  The  number  of  external  inputs,  external 
outputs  and  nodes  are  user-defined.  The  output  at  time  <  -t-  1  is  predictive 
based  upon  the  current  input  and  the  previous  output.  The  network  is  fully 
interconnected  by  connection  weights,  adjusted  using  the  gradient-descent 
method.  Feedback  is  introduced  by  using  the  network’s  previous  output  as  part 
of  the  current  input. 

framework  for  such  a  problem  as  a  recurrent  network  which  unfolds  into  a  multilayer 
feedforward  network  that  grows  by  one  layer  on  each  time  step  (18).  The  adjustments  to 
the  network’s  connection  weights  are  designed  to  minimize  a  time-averaged  measure  of 
the  network’s  overall  learning  error.  This  is  refei"red  to  as  backpropagation  through  time 
(BPTT).  Its  strength  lies  in  its  generality,  but  a  corresponding  weakness  is  its  growing 
memory  requirement  when  trained  on  arbitrarily  long  sequences.  It  is  this  method  (BPTT) 
which  most  researchers  tend  to  modify.  For  example,  Rohwer  and  Forrest  (14)  presented 
a  variation  of  the  backpropagation  feedforward  training  method  (18).  This  method  can 
be  indirectly  applied  to  time-dependent  problems  in  arbitrarily  connected  networks  by 
modeling  a  virtual  network  made  from  several  copies  of  the  original,  with  one  copy 
for  each  time  step.  The  adjustments  to  the  network’s  connection  weights  are  designed 


to  minimize  a  time-averaged  measure  of  the  network’s  overall  learning  error.  In  this 
method,  erroi?  are  assessed  and  handled  simultaneousiv  throughout  the  network  rather 
than  propagated  through  it.  It  can  be  applied  directly  to  arbitrarily  connected  networks, 
provided  that  a  certain  criterion  related  to  the  training  problem  is  ::atisfied.  When  this 
criterion  is  not  met,  a  modification  of  the  training  problem  can  be  found  which  properly 
improves  the  stability  of  the  network. 

.,.5  Modified  BPTT 

There  are  many  recurrent  neural  network  models  whos.  architectures  are  modified 
versions  of  previously  pubii.shcd  work.  For  example,  Pineda  (td)  has  recently  generalized 
Rumelhart’s  backpropagation  learning  algorithm  for  feedforward  neural  networks  (18) 
to  recurrent  neural  networks.  Pearlmutter  (9)  has  further  generalized  this  algorithm  to 
recurrent  networks  that  produce  time-dependent  trajectories.  The  Pearlmutter  architecture 
requires  mu'^h  more  training  time  than  that  of  the  Rumelhart  or  Pineda  algorithms.  As 
a  result.  Fang  and  Sejnowski  (2)  modified  the  Pearlmutter  algorithm  to  improve  both  its 
performance  and  speed.  The  Fang-Sejnowski  article  detailed  the  modifications  on  the 
learning  update  rule  which  allows  ad:q)table  independent  learning  rates  for  individual 
parameters  in  the  algorithm.  This  allows  fast  parameter  estimation  while  avoiding  most 
cases  of  catastrophic  divergences. 

2. 6  Real-Time  Recurrent  Learning  ( RTRL) 

One  particularly  interesting  article  describes  a  learning  algorithm  for  training  com¬ 
pletely  recurrent,  continu-illy  updated  networks  to  leam  temporal  tasks  (23).  This  technique 
emphasizes  using  uniform  starting  configurations  that  contam  no  previously  known  infor¬ 
mation  about  the  temporal  nature  of  the  task.  More  precisely,  it  is  a  gradient-following 
learning  algorithm  which  tracks  the  total  network  error  along  a  trajectory  which  minimizes 
this  total  error.  Its  main  advantage  is  that  it  does  not  require  a  precisely  defined  training 
interval.  It  operates  while  the  system  is  running.  A  disadvantage  is  that  it  requires  nonlo- 
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cal  communication  during  training.  This  means  it  is  computationally  expensive.  Yet,  the 
algorithm  allows  recurrently  connected  networks  to  learn  complex  tasks  that  require  the 
retention  of  information  over  fixed  or  indefinite  time  periods.  This  algorithm  is  referred 
to  as  the  real  time  recurrent  learning  (RTRL)  algorithm.  It  is  this  algorithm  that  this  thesis 
effort  is  based  upon. 

2.7  Subgrouped  RTRL 

Whereas  RTRL  has  been  shown  to  have  great  power  and  generality,  it  has  the  dis¬ 
advantage  of  requiring  a  great  deal  of  computation  time  (CPU  intensive).  To  address  this 
problem,  Zipser  proposed  an  improved  technique  which  reduces  the  amount  of  computa¬ 
tion  required  by  RTRL  without  changes  in  network  connectivity  (24).  The  reduction  in 
computation  time  is  a  result  of  ner  voik  subgrouping.  The  original  network  is  divided  into 
subnets  for  the  purpose  of  er.or  propagation,  leaving  them  undivided  for  activity  propa¬ 
gation.  This  means  that  during  training,  the  network  is  subgrouped  only  when  the  error 
is  propagated  backward  through  the  network’s  connection  weights.  During  the  normal 
feedforward  propagation  portion,  the  network  remains  fully  connected.  A  comparison  of 
this  new  method  and  the  previous  RTRL  method  showed  the  subgrouped  RTRL  algorithm 
to  be  10  times  faster  on  learning  to  be  a  finite-state  part  of  a  Turing  machine  (24). 

2.8  Summary 

As  technology  improves,  new  and  innovative  algorithms  are  discovered  which  help 
researchers  and  engineers  alike  in  solving  time-dependent  problems.  All  of  the  algorithms 
previously  discussed  possess  the  ability,  if  properly  trained,  to  tackle  or  completely  solve 
many  difficult  temporal  tasks.  Several  of  these  algorithms  are  simply  modifications  of  the 
backpropagation  through  time  method,  or  the  re?.l  lime  recurent  learning  method. 

Although  great  strides  have  been  made  in  advancing  recurrent  neural  network  tech¬ 
nology,  further  research  is  stili  needed.  Most  of  these  algorithms  are  implemented  on 
digital  machines.  Because  of  this,  routines  can  be  constructed  in  code  which  cannot  be 
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physically  realizable.  That  is,  they  cannot  be  implemented  in  hardware  configurations. 
Therefore,  further  research  is  necessary  to  determine  whether  or  not  these  recurrently  con¬ 
nected  networks  can  be  realized  as  physical  elements,  thus  greatly  increasing  their  speed 
and  utility. 
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III.  Methodology 


3.1  Introduction 

Citing  the  work  of  several  neural  network  researchers,  Chapter  II  covered  a  subset 
of  the  most  recent  research  into  recurrent  artificial  neural  network  algorithms.  Specifically 
noted  were  the  real-time  recurrent  learning  (RTRL)  algorithm,  and  the  subgrouped  RTRL 
algorithm.  This  thesis  seeks  to  encode  the  RTRL  algorithm,  perform  several  modifications, 
and  use  the  resulting  network  as  a  reliable  engine  for  function  prediction  problems. 

This  chapter  covers  the  development,  modifications,  and  testing  of  the  RTRL  al¬ 
gorithm  for  function  prediction  applications  at  AFIT.  The  basics  of  the  RTRL  algorithm 
along  with  the  current  modifications  of  this  algorithm  are  described.  In  addition,  the  testing 
procedures  and  training  methods  used  on  the  algorithm  are  discussed.  The  chapter  con¬ 
cludes  with  a  description  of  how  the  recurrent  neural  network  was  applied  to  two  specific 
problems. 

3.2  RTRL  Algorithm 

The  real-time  recurrent  learning  algorithm  (23)  is  a  gradient-following  algorithm  for 
completely  recurrent  networks  running  in  continually  sampled  time.  The  architecture  of 
the  network  consists  of  a  user  specified  number  of  input  nodes,  a  unity  input  bias,  and  a  user 
specified  number  of  "hidden  nodes"  and  output  nodes  (see  Figure  2).  The  output  nodes 
(or  hidden  nodes  for  that  matter)  were  designed  originally  to  process  the  nodal  activation 
using  the  sigmoid  function  (Eq  1).  The  nodal  activation  is  defined  as  the  weighted  sum 
of  all  the  inputs  to  a  particular  processing  node.  Each  output  node  is  fully  connected,  by 
weighted  connections,  to  every  other  node  in  the  network,  including  external  inputs  and 
previous  outputs.  Once  the  output  is  computed,  it  is  returned  and  used  as  a  part  of  the  new 
network  input. 
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The  derivation  of  the  RTRL  algorithm  is  contained  in  the  article  by  Williams  and 
Zipser  (23).  In  this  thesis,  only  the  most  important  equations  will  be  highlighted.  The 
basic  network  has  n  units  (nodes)  and  m  external  inputs.  Any  or  all  of  the  network  units 
can  be  outputs.  Let  yk{t)  denote  the  output  of  the  ^th  node  at  time  t,  and  let  Xk[t)  denote 
the  kth  external  input  signal  to  the  network  at  time  t.  Now  define  Zk{t)  to  be  the  composite 
network  input  at  time  t.  In  other  words,  Zk{t)  is  obtained  by  concatenating  x{t)  and  y{t), 
so  that 


Zk{t)  =  I 


Xk{t)  ifkel 
yk{t)  if keU 


(2) 


where  U  denotes  the  set  of  indices  k  such  that  Zk  is  the  output  of  a  unit  in  the  network, 
and  whjre  /  denotes  the  set  of  indices  k  such  that  Zk  is  an  external  input.  Note:  the  unity 
bias  term  is  assumed  to  be  a  part  of  the  m  inputs.  With  a  fully  interconnected  network,  the 
weight  matrix  Wij  becomes  a  single  n  x  (m  +  n)  matrix,  with  i  corresponding  to  a  specific 
node,  and  j  corresponding  to  a  specific  input. 


The  out  mt  of  the  ^'th  unit  as  a  function  of  the  input  vector  and  connection  weights 
(the  nodal  activation)  is  given  by 


yk{t  +  i)  =  fk{sk{t))  (3) 

where  fk  is  the  unit’s  processing  function,  and  the  nodal  activation  Sk{t)  is  given  by 

«*(<)  =  S  (4) 

teuui 

For  this  thesis,  the  unit’s  processing  function  will  be  either  sigmoidal  (Eq  1),  or  a  combi¬ 
nation  of  linear  and  sigmoidal  units.  Notice  in  Eq  3  that  the  output  of  any  unit  y{t+  1) 
is  not  influenced  by  the  external  input  until  time  t  -f- 1.  This  means  that  given  the  current 
input  value  at  time  t,  the  network  will  compute  (predict)  the  output  for  time  f  -f  1 .  This  fact 
is  important  when  performing  function  prediction.  In  addition,  it  is  important  in  knowing 
how  to  set  up  the  training  and  testing  data  sets  so  that  the  desired  network  output  is  located 
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at  time  t  +  1  as  opposed  to  time  t. 


Since  Eqs  3  and  4  specify  the  entire  discrete-time  dynamics  of  the  network,  the  weight 
update  equation  must  be  specified  according  to  these  dynamics.  This  is  accomplished  by 
measuring  the  network  performance  over  time,  and  then  computing  its  gradient  in  weight 
space,  following  the  negative  gradient  to  a  minimum  total  error.  For  this  derivation,  the 
error  is  defined  as 


ek{t)  = 


dk{t)-yk{t)  if keT 
0  otherwise 


(5) 


where  T  denotes  the  set  of  indices  k  £  U  for  which  there  exists  a  target  value  dk{t)  that 
the  output  of  the  k\h  unit  should  match.  Therefore,  the  total  network  error  is  defined  as 


Jtoialit)  =  E  5  E  (6) 

t  ^k&J 

It  is  the  negative  gradient  of  this  total  error  that  must  be  followed  to  a  minimum  value. 

The  weight  update  rule  adjusts  the  weight  matrix  along  a  positive  multiple  of  the 
negative  gradient  of  the  total  error.  To  achieve  this  weight  update  rule,  an  incremental 
delta  weight  (weight  change)  value  is  requited.  This  delta  weight  is  initially  defined  as  a 
fixed  multiple  of  the  gradient  of  the  total  error  with  respect  to  the  connection  weights  at 
each  time  step.  In  other  words. 


Awo(0  = 


dj{t) 

""  dWij 


(7) 


where  a  is  some  fixed  positive  learning  rate.  For  this  thesis,  the  learning  rate  will  not  be 
fixed.  This  modification  will  be  further  described  in  the  modification  section  to  follow. 

Therefore,  following  Williams  and  Zipser’s  derivation  for  gradient  descent,  the 
algorithm  must  compute  the  trajectory  by 


keu 


(8) 
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where  is  a  measure  of  the  sensitivity  of  the  output  at  time  t  to  a  small  change  in  Wij. 
It  is  assumed  that  the  initial  conditions,  external  inputs,  and  remaining  weights  are  not 
altered  at  all  during  this  sensitivity  measure.  Thus  the  sensitivity  of  the  network  at  some 
future  time  is  given  by 


dykjt  + 1) 

dwij 


=  f'Mt)] 


+  SikZjit) 

l€U 


(9) 


for  all  A:  €  U,  i  €  U,  and  j  £UUI.  The  term  6ik  denotes  the  Kronecker  delta  function. 
Thus,  by  defining  the  variable 


p'lj{t+\)  = 


dykjt  + 1) 

dwij 


the  network  dynamics  is  governed  by 


Piji*  +  ^)  =  fkM)] 


Yj  '^klPij  +  ^ikZjjt) 


where  the  initial  conditions  are  defined  as 


(10) 


PijM  =  0. 


For  this  thesis,  when  the  output  function  is  sigmoidal,  the  de’*'  'ative  of  the  network 
processing  function  with  respect  to  the  activation  is  given  by 

f'khjt)]  =  ykjt  +  1)[1  -  ykjt  +  1)].  (11) 

When  the  output  function  is  linear,  the  derivative  of  the  network  processing  function  with 
respect  to  the  activation  is  given  by 


/jbh(0]  =  1- 


(12) 
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The  RTRL  algorithm  is  a  gradient-following  algorithm.  This  means  that  it  follows 
the  gradient  descent  method  for  computing  weight  updates.  However,  because  of  it’s 
continuous  time  nature,  it  only  approximates  following  the  true  negative  gradient  of  the 
error  curve.  This  approximation  is  done  by  incrementally  updating  the  weights  at  each 
time  step  rather  than  by  the  traditional  batch  update  method,  where  the  weight  changes 
are  summed  and  then  added  to  the  existing  weights  at  the  end  of  each  epoch.  An  epoch 
is  simply  one  complete  pass  through  the  entire  data  set.  While  the  batch  method  follows 
the  true  gradient  of  the  total  error,  the  RTRL  technique  is  known  to  work  well  in  practice. 
The  use  of  a  small  enough  learning  rate  leads  to  a  net  weight  update  whose  direction  is  a 
close  enough  approximation  to  the  true  gradient. 

3.3  Modifications 

Several  modifications  have  been  added  to  the  RTRL  algorithm  to  better  adapt  it 
to  the  function  prediction  task.  First,  the  network  output  was  modified  to  enable  the 
external  output  units  to  process  as  linear  units  rather  than  as  sigmoidal  units.  The  hidden 
units  remain  as  sigmoidal  processors.  Only  the  external  output  units  were  modified. 
Although  the  original  algorithm  did  not  have  linear  outputs  with  sigmoidal  hidden  nodes, 
this  modification  would  greatly  increase  the  network’s  use  in  applications  where  the  output 
exceeds  the  range  of  0  to  1.  With  a  linear  output,  the  network’s  response  can  be  read  and 
interpreted  directly.  In  addition,  linear  outputs  do  not  require  the  desired  output  vector  to 
be  normalized,  thus  saving  computation  time. 

The  second  modification  to  the  standard  RTRL  algorithm  was  to  add  a  variable 
learning  rate  to  provide  for  an  increased  co..  *;rgence  of  the  total  error.  The  learning  rate 
will  be  variable  based  upon  the  stability  of  the  total  error  accumulated  over  an  entire  epoch. 
If  the  ratio  of  the  previous  total  error  to  the  current  total  error  is  less  than  a  desired  constant 
less  than  one,  or  if  the  difference  between  the  previous  total  error  and  the  current  total  error 
is  less  than  zero,  then  reduce  the  learning  rate  by  a  factor  of  two  (arbitrary).  Otherwise, 
do  not  change  the  learning  rate.  If  the  difference  between  the  previous  total  error  and  the 
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current  total  error  is  less  than  zero,  this  means  that  the  total  error  is  beginning  to  increase 
or  oscillate.  By  reducing  the  learning  rate  at  this  point,  the  total  error  will  continue  to 
decrease  until  convergence.  The  ratio  of  the  previous  total  error  to  the  current  total  error 
measures  the  incremental  change  in  the  total  error  in  time.  If  this  change  is  less  than  the 
desired  incremental  change,  the  learning  rate  is  reduced.  The  desired  incremental  change 
for  this  thesis  is  0.999. 

To  verify  that  the  learning  rate  modification  increased  the  network  convergence,  a 
series  of  tests  were  performed  to  measure  the  average  total  error  of  the  network  with  and 
without  the  variable  learning  rate.  The  recurrent  network  was  configured  with  1  input,  1 
sigmoidal  output,  and  1  hidden  sigmoidal  unit.  The  training  data  set  contained  a  pseudo¬ 
random  number  sequence  1024  vectors  long.  Chapter  IV  contains  the  results  of  these 
tests. 

3.4  Testing 

The  network  was  tested  using  several  temporally  encoded  data  sets.  The  first  task 
was  to  train  the  network  to  learn  the  exclusive  OR  (XOR)  operation.  Although  XOR  is  not 
inherently  time  dependent,  the  network  will  learn  it  if  the  output  is  delayed  for  a  specified 
amount  of  time.  The  next  problem  attempted  was  to  teach  the  network  to  learn  an  internal 
state  problem  (22:97-100).  That  is,  the  network  must  recognize  that  two  particular  events 
have  occurred  in  a  prescribed  order,  regardless  of  the  number  of  the  intervening  events. 
The  network  was  then  trained  to  predict  the  frequency  response  of  a  second  order  HR 
lowpass  (Butterworth)  filter.  HR  is  an  acronym  which  means  "Infinite  Impulse  Response". 
This  was  done  by  training  the  network  on  the  impulse  response  of  the  filter,  and  tnen 
testing  the  response  of  the  network  to  various  inputs. 

3.4.1  Exclusive  OR  (XOR)  Exclusive  OR  (XOR)  is  a  disjoint  region  problem  (see 
Figure  3).  This  means  that  there  are  two  disjoint  regions  in  the  decision  space  for  each 
class.  The  classes  must  be  separated  by  at  least  two  decision  planes  before  an  input  value 
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can  be  correctly  classified  into  one  of  the  class  regions. 


Figure  3.  The  disjoint  region,  or  exclusive  OR,  feature  space.  No  single  decision  plane 
can  separate  the  regions  by  class. 

Multilayer  perceptrons  trained  with  back  propagation  have  demonstrated  an  ability 
to  learn  this  problem  very  well  (13:53-61).  The  use  of  multiple  layers  is  to  allow  the 
formation  of  multiple  decision  planes  within  the  feature  space. 

Since  a  recurrent  neural  network  can  be  viewed  as  a  multilayer  feedforward  neural 
network  which  has  been  folded  back  onto  itself  in  time  (18),  it  should  also  be  able  to 
solve  the  XOR  problem  just  as  well.  However,  some  alterations  to  the  data  set  need  to  be 
considered.  Namely,  the  desired  output  of  the  data  needs  to  be  delayed  a  specified  length 
of  time  in  order  to  accommodate  the  predictive  nature  of  the  recurrent  network.  These 
alterations  were  required  because  the  XOR  problem  is  not  a  good  test  of  a  recurrent  neural 
network.  There  is  no  time  dependency  within  the  XOR  problem  unless  it  is  physically 
manipulated  to  contain  timed  information. 

Thus,  the  XOR  problem  was  included  in  this  thesis  in  order  to  identify  how  the 
recurrent  network  performs  when  given  a  temporally  encoded  spatial  problem. 
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The  fully  connected  recurrent  network  was  configured  with  2  external  inputs,  1 
sigmoidal  output,  and  4  hidden  sigmoidal  units.  The  learning  rate  started  at  4.0.  Sigmoidal 
units  were  chosen  because  the  desired  output  values  are  0  and  1 .  Initially,  the  data  consisted 
of  a  randomly  generated  set  of  I’s  and  O’s  as  input  while  the  output  was  the  XOR  of  the 
input  delayed  by  two  time  steps.  Each  data  vector  was  considered  a  separate  time  sample. 
The  net  trained  on  a  randomly  generated  binarized  XOR  data  file  (that  is,  the  values  were 
either  0  or  1).  The  number  of  training  vectors  was  1024,  and  the  training  concluded  after 
20  epochs.  The  decision  threshold  for  correctness  was  0.5;  if  the  output  was  greater  than 
0.5,  the  output  was  considered  a  1,  and  if  the  output  was  less  than  0.5,  the  output  was 
considered  a  0. 

After  the  network  training  was  complete,  the  weights  saved  from  the  training  run 
were  used  as  test  weights.  The  network  was  tested  using  another  binarized  XOR  data  set 
generated  from  a  different  random  seed.  This  guarantees  that  the  temporal  presentation  of 
the  XOR  data  set  is  randomly  changed.  The  results  of  this  test  should  show  how  well  the 
network  weights  generalized  the  XOR  learning  law. 

Another  separate  training  was  performed  to  test  the  network’s  ability  to  generalize 
the  complete  XOR  data  set.  That  is,  can  the  recurrent  network  learn  more  that  just  the 
verticies  of  the  XOR  data  set?  To  answer  this  question,  the  network  wa®  trained  on  an 
analog  XOR  data  set  as  opposed  to  the  binarized  XOR  data  set.  The  spatial  distrubution  of 
the  analog  training  data  set  is  displayed  in  Figure  4.  The  network  configuration  contained 
2  inputs,  1  sigmoidal  output,  and  5  hidden  sigmoidal  nodes.  The  network  weights  were 
trained  for  300  epochs  through  the  512  vector-length  analog  data  set.  The  weights  were 
saved  and  used  to  test  a  1024  vector-length  analog  data  set.  In  addition,  the  two  binarized 
data  sets  were  also  tested  using  the  above  saved  weights.  If  the  network  can  generalize  the 
analog  XOR  data  set  in  a  true  spatial  sense,  then  it  would  be  expected  to  perform  perfectly 
on  the  binarized  XOR  data  sets.  Chapter  IV  contains  all  the  results  and  discussion  of  these 
tests. 
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3.4.2  Internal  State  Learning  to  represent  internal  state  is  considered  a  simple 
sequential  recognition  task  for  humans.  However  for  traditional  feedforward  neural  net¬ 
works,  this  is  a  nearly  impossible  task.  This  test  should  demonstrate  the  power  of  a  simple 
recurrent  network  on  timed  sequential  signals. 

As  demonstrated  by  Williams  and  Zipser  (22:97-100),  let  there  be  four  inputs  to  the 
network,  each  line  corresponding  to  the  letters  abc  and  d  respectively.  The  a  and  b  lines 
are  the  actual  input  decision  lines  and  the  c  and  d  lines  serve  as  distractor  lines  only.  On 
any  given  time  step,  a  randomly  chosen  input  line  is  given  a  value  of  1,  with  all  others 
given  the  value  of  0.  The  desired  output  for  the  network  is  1  on  the  time  step  immediately 
following  the  first  occurrence  of  b  following  an  a.  Otherwise,  the  desired  output  is  0. 

For  this  task,  the  network  consists  of  4  external  inputs,  1  sigmoidal  output,  1 
sigmoidal  hidden  node,  and  an  initial  learning  rate  of  5.0.  The  data  set  contained  95  time 
samples  (vectors).  The  initial  weight  values  were  randomly  generated  from  the  interval 
[-1,1].  Figure  5  shows  the  recurrent  network  configuration  used  for  training  on  the  internal 
state  problem.  Because  the  network  algorithm  has  a  variable  learning  rate,  the  initial  value 
of  the  learning  rate  is  used  to  get  the  network  started  on  the  "right  track".  For  this  test,  if 
the  learning  rate  were  less  than  5.0,  the  network  would  still  converge  but  at  a  slower  rate. 
If  the  learning  rate  were  greater  than  5,0,  the  network  total  error  would  initially  converge 
rapidly  but  would  then  rapidly  diverge.  The  algorithm  would  then  lower  the  learning  rate 
to  half  of  its  original  value  and  continue  until  it  converges. 

The  network  was  trained  and  tested  on  two  data  sets  which  contain  randomly  gen¬ 
erated  input  vectors.  However,  the  time  separation  between  the  occurrence  of  6  following 
a  is  different  in  both  data  files.  This  means  that  the  occurrence  of  the  specified  state 
transition  differs  between  the  two  data  sets.  For  the  training  set,  there  were  27  occurrences 
of  6  while  only  10  of  these  6’s  followed  consecutively  after  an  a.  The  time  separation 
between  ab  pairs  for  the  training  set  is  illustrated  in  Table  1.  For  the  test  set,  there  were 
15  occurrences  of  b  while  only  8  of  these  6’s  followed  consecutively  after  an  a.  The  time 
separation  between  ab  pairs  for  the  test  set  is  illustrated  in  Table  2.  Therefore  a  correct 
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Figure  5.  Recurrent  network  configuration  for  learning  to  represent  internal  state.  For 
this  task,  the  network  consists  of  4  external  inputs,  labeled  a,  b,  c,  and  d,  1 
output,  2  sigmoidal  nodes,  and  an  initial  learning  rate  of  5.0.  The  feedback 
nodes  are  the  previous  (t-1)  values  of  both  nodes.  The  entire  bottom  row  of 
nodes  represents  the  input  vector  z(t). 

prediction  of  the  occurrence  of  the  transition  in  the  test  set  will  demonstrate  the  network’s 
ability  to  learn  the  internal  state  problem  presented.  The  output  of  the  network  and  the 
results  of  this  test  are  contained  in  Chapter  IV. 

3.4.3  Second  Order  HR  Lowpass  Filter  This  test  demonstrates  the  network’s 
generalization  ability  in  simulating  a  linear  system.  A  second  order  lowpass  Butterworth 
filter  was  used  as  the  linear  system  for  this  test.  The  filter  has  a  normalized  cutoff  frequency 
of  0.1  and  is  described  by  the  following  difference  equation: 

y[t]  =  0.0676(3:[t]  +  2x[t  -  1]  +  x\t  -  2])  +  1.1422y[t  -  1]  -  0.4124y[<  -  2]  (13) 

A  data  set  was  generated,  using  this  difference  equation,  to  train  the  network  to  learn  the 
frequency  response  of  the  filter.  Ihe  input  to  the  network  is  the  sampled  test  signal,  and 
the  desired  output  of  the  network  is  the  output  of  the  difference  equation  offset  by  one  time 
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Table  1.  The  time  separation  for  ab  pairs  in  the  training  data  set.  The  separation  is  the 
number  of  time  steps  between  the  occurrence  of  an  a  and  the  first  occurrence  of 
a  6  in  the  training  set.  The  occurrence  list  shows  how  many  of  the  respective 
separations  exist  within  the  data  set. 


ab  pair  separation 

Occurrence 

1 

2 

2 

3 

4 

1 

5 

1 

6 

1 

7 

1 

18 

1 

total  =  10 

Table  2.  The  time  separation  fc^  ab  pairs  in  the  test  data  set.  The  separation  is  the  number 
of  time  steps  between  the  occurrence  of  an  a  and  the  first  occurrence  of  a  6  in 
the  test  set.  The  occurrence  list  shows  how  many  of  the  respective  separations 
exist  within  the  data  set. 


ab  pair  separation 

Occurrence 

1 

3 

2 

1 

7 

1 

9 

1 

22 

1 

29 

1 

total  =  8 
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step  (t  +  1).  The  offset  is  to  accommodate  the  predictive  nature  of  the  recurrent  network. 
Since  the  input  does  not  affect  the  output  until  time  ^  +  1,  the  desired  output  value  in  the 
data  set  must  be  located  at  aiat  t  +  1  position.  The  data  set  contained  128  data  points. 
Appendix  C  contains  the  C  code  (make_data.c)  used  to  generate  this  data  set. 

When  the  input  to  a  system  is  a  single  delta  function,  the  output  is  called  the  impulse 
response  of  the  system.  Because  the  Butterworth  filter  is  a  linear  system,  a  single  delta 
function  (impulse)  was  used  as  the  input  in  order  to  obtain  the  system’s  impulse  response.  A 
linear  system  is  completely  characterized  by  its  impulse  response  (3:143-144).  This  means 
that  if  the  linear  system’s  impulse  response  is  known,  the  response  to  a  complicated  input 
can  be  determined  by  decomposing  the  complicated  input  into  a  superposition  of  a  large 
number  of  appropriately  weighted  and  positioned  delta  functions.  The  overall  response  is 
then  determined  by  summing  the  responses  to  all  tl^e  individual  delta  functions.  Therefore, 
if  the  recurrent  neural  network  is  trained  on  the  impulse  response  of  the  filter,  the  network’s 
response  should  completely  characterize  this  linear  system,  regrirdless  of  the  complexity 
of  the  input  signal. 

The  network  consisted  of  I  external  input,  1  output,  and  i  sigmoidal  hidden  node. 
For  this  test,  the  output  noJ.r  was  defined  as  a  linear  function.  Figure  6  shows  the  network 
configuration  for  the  tests. 

The  network  wa.«  trained  on  the  impulse  response  of  the  filter.  This  means  that  the 
desired  oueput  of  the  network  was  the  actual  output  of  the  Butterworth  filter  when  the 
input  was  a  single  impulse.  The  input  to  the  network  wa<:  a  single  impulse.  After  training, 
the  following  input  signals  were  used  to  test  the  network's  ability  to  simulate  the  filter:  a 
unit  step,  two  different  sinusoids,  and  a  pseudo-random  signal  (to  simulate  white  noise). 
Again,  for  this  test,  the  network  was  trained  only  on  the  impulse  response  of  the  filter. 

After  the  input  signal  was  applied  to  the  network,  the  output  was  processed  through 
an  FFT  (Fast  Fourier  Transform)  algorithm,  and  the  network  frequency  spectrum  was 
compared  to  the  desired  frequency  response  of  the  filter.  More  specifically,  the  frequency 
spectrum  of  the  output  of  the  difference  equation  was  compared  with  the  frequency  spec- 
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Figure  6.  Recurrent  network  configuration  for  learning  to  simulate  the  frequency  re¬ 
sponse  of  a  linear  system.  For  this  task,  the  network  consisted  of  1  external 
input  (the  input  signal  to  the  filter),  1  linear  output  (the  "filtered"  input),  1 
sigmoidal  hidden  unit,  and  a  variable  learning  rate  (initially  0.02  for  a  linear 
output  unit). 

trum  of  the  output  of  the  recurrent  network.  The  comparison  was  performed  using  all  four 
input  signals  separately.  The  results  of  these  tests  are  highlighted  in  Chapter  IV. 


3.5  Applications 

If  this  real-time  recurrent  learning  network  can  simulate  the  response  of  a  linear 
system,  the  next  straightforward  application  would  be  to  test  the  predictive  ability  of  the 
network  on  real-time  problems.  Two  problems  of  particular  interest  are  described  as 
follows:  predicting  3-D  head  position  in  time,  and  voice  data  reconstruction. 

3.5.1  Predicting  3-D  Head  Position  in  Time  Given  the  x,  y,  and  z  coordinates  of 
a  pilot’s  head  in  Euclidean  space,  the  recurrent  network  should  be  able  to  predict  what  the 
future  position  of  the  pilot’s  head  will  be  based  upon  the  pilot’s  previous  head  position. 
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For  this  application,  the  data  set  (provided  by  the  Aeronautical  Systems  Division,  Wright* 
Patterson  AFB)  contained  raw  position  coordinates  as  the  input  vector,  and  the  actual 
position  coordinates  for  where  the  head  position  was  in  time  as  the  desired  output  vector. 
These  position  vectors  were  divided  into  separate  coordinate  positions.  That  is,  the  input 
and  desired  output  for  the  x-axis  was  extracted  into  a  separate  data  set,  and  likewise  for 
the  y-axis  and  z-axis  data. 

The  network  configuration  consisted  of  1  input  (the  current  head  position  at  time  f), 
1  sigmoidal  output  (the  predictive  head  position  at  time  t  +  r,  where  r  is  some  arbitrary 
time),  and  one  sigmoidal  hidden  unit.  The  desired  output  in  the  data  set  was  offset  by  2 
time  steps.  This  means  that  for  a  given  input  position,  the  desired  output  position  is  the 
actual  position  either  2  time  steps  in  the  future.  There  were  8997  position  samples  in  the 
data  set.  The  network  was  trained  on  the  first  10(X)  samples  of  the  data  set  for  400  epochs 
with  an  initial  learning  rate  of  3.0.  Following  training,  the  network  weights  were  used  to 
test  the  remaining  7997  data  points  to  see  how  well  the  recurrent  netwoik  could  predict 
the  respective  coordinate  position. 

The  training  results  are  then  compared  to  the  results  of  a  statistical  prediction  algo¬ 
rithm  which  theoretically  produces  the  best  linear  approximation.  The  statistical  prediction 
algorithm  is  fully  described  in  Appendix  E.  This  comparison  is  expected  to  show  how  the 
recurrent  network  performs  with  respect  to  the  best  linear  prediction.  Chapter  IV  contains 
all  the  results  of  this  application  of  the  recurrent  network. 

3.5.2  Voice  Data  Reconstruction  For  the  task  of  voice  data  reconstruction,  the 
recurrent  network  was  required  to  learn  the  difference  between  fricative  (noisy  speech) 
and  voiced  (non-noisy)  speech  samples.  More  precisely,  given  a  select  set  of  input  features 
which  should  uniquely  describe  a  sampled  portion  of  speech,  the  recurrent  netwoik  was 
to  classify  whether  that  portion  of  speech  was  fricative  (class  1)  or  voiced  (class  0).  If  the 
sampled  portion  of  speech  was  classified  as  a  fricative,  noise  was  added  to  that  portion  of 
the  signal  on  reconstruction,  whereas,  if  the  sampled  portion  was  classified  as  voiced,  no 
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noise  was  added  during  reconstruction.  Thus  the  reconstructed  speech  signal  would  regain 
most  of  the  high  frequency  content  it  lost  before  transmission  (20). 

The  data  set  for  this  application  consisted  of  four  features  computed  from  a  variable 
width  sample  of  speech.  The  first  feature  was  the  total  energy  contained  in  the  sample.  The 
second  feature  was  the  number  of  zero-crossings  that  occurred  during  the  sample  period 
divided  by  the  sample  window  length.  The  third  feature  was  the  number  of  slope  changes 
divided  by  the  sample  window  length.  And  the  fourth  feature  was  the  total  energy  below 
500  hertz  within  the  sample  window.  The  desired  output  was  a  classification  based  upon 
whether  the  input  features  described  a  noisy  portion  of  speech  or  not. 

Previous  attempts  at  learning  the  data  set  used  a  feedforward  neural  network  trained 
with  backpropagation.  This  network  contained  4  inputs,  4  hidden  sigmoidal  nodes,  and 
a  2  class  output.  After  training,  the  network  weights  were  saved  and  used  as  previously 
described  in  the  reconstruction  process  to  classify  noisy  or  non-noisy  data.  Training 
accuracy  was  87%  .  However,  this  feedforward  network  would  not  make  the  proper 
decisions  regarding  the  noise  classification  of  the  data  when  used  in  the  reconstruction 
program.  Therefore,  another  network  architecture  was  sought  out  to  attempt  to  solve  the 
problem:  the  recurrent  network. 

The  recurrent  network  configuration  consisted  of  4  inputs,  1  sigmoidal  output,  and 
no  hidden  units.  The  network  was  trained  for 400  epochs  on  a  data  set  containing  1 500  data 
samples.  The  learning  rate  was  initialized  to  0.02.  If  the  learning  rate  were  greater  than 
0.02,  the  total  error  for  a  single  epoch  would  be  exceedingly  large,  causing  the  network  to 
catastrophically  diverge. 

Following  training,  the  network  weights  were  saved  and  used  in  the  reconstruction 
program.  Within  the  reconstruction  program,  the  weights  were  used  with  the  transmitted 
signal  to  reconstruct  the  speech  pattern,  adding  noise  where  needed  based  upon  the  net¬ 
work’s  classification  decision.  Chapter  IV  contains  all  the  results  of  this  application  of  the 
recurrent  network. 
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3.6  Summary 

The  methodology  for  developing  and  testing  the  RTRL  algorithm  has  been  described. 
The  dynamics  of  the  RTRL  algorithm  were  described  and  the  modifications  to  the  original 
algorithm  were  outlined.  Next,  the  testing  methodology  used  in  this  thesis  was  described. 
The  results  of  these  tests  show  how  robust  the  recurrent  network  is  to  learning  temporal 
XOR,  representing  internal  state,  and  simulating  a  linear  system.  The  predictive  ability  of 
the  recurrent  network  was  then  applied  to  two  problems:  head  position  tracking,  and  voice 
data  reconstruction.  Ch^ter  IV  contains  the  results  and  a  discussion  of  tli^se  tests. 
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IV.  Results  and  Discussion 


Chapter  III  covered  details  of  the  development,  modification,  and  testing  of  the 
RTRL  algorithm  as  a  viable  part  of  a  function  prediction  scheme.  This  chapter  contains 
the  results  of  these  tests,  including  a  section  on  the  recurrent  network’s  application  to  two 
function  prediction  problems:  3-d  head  position  tracking,  and  voice  data  reconstruction. 
The  results  are  presented  in  the  same  order  as  they  appeared  in  Chapter  III. 

4.1  Modifications 

To  verify  that  the  learning  rate  modification  increased  the  network  convergence,  a 
series  of  tests  were  performed  to  measure  the  average  total  error  of  the  network  with  and 
without  the  variable  learning  rate.  The  recurrent  network  was  configured  with  1  input,  1 
sigmoidal  output,  and  1  hidden  sigmoidal  unit.  The  training  data  set  contained  a  pseudo¬ 
random  number  sequence  1024  vectors  long.  The  recurrent  network  was  trained  on  this 
data  set  for  500  epochs.  The  results  displayed  do  not  indicate  that  the  network  learned 
to  predict  the  pseudo-random  sequence.  Rather,  the  plots  simply  illustrate  the  difference 
between  using  the  variable  learning  rate  modification  and  a  fixed  learning  rate. 

Figure  7a)  displays  the  network  training  error  when  the  learning  rate  is  fixed  at  4,0. 
The  total  error  decreases  to  a  point  and  then  diverges  and  becomes  unstable.  This  implies 
that  the  learning  rate  is  not  small  enough  to  account  for  small  weight  variations  required 
within  the  network.  Figure  7b)  shows  the  exact  same  training  as  before,  except  with  a 
variable  learning  rate,  initially  set  at  4.0,  The  network  continued  to  converge  throughout 
the  500  epochs.  The  sharp  drops  in  the  error  plot  show  when  a  change  in  the  learning  rate 
occurred.  The  final  learning  rate  value  at  the  end  of  the  training  was  0.5.  Using  this  final 
value  as  the  starting  value  for  a  fixed  learning  rate  test,  the  network  was  trained  again  on 
the  same  data  ‘•“t.  Figure  7c)  displays  the  results  of  this  last  test.  Note  that  the  network 
continues  to  smoothly  converge  throughout  the  training  run.  Yet  at  the  last  epoch,  the 
total  error  was  still  not  as  low  as  that  for  the  variable  learning  rate.  This  shows  that  a 
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Figure  7,  Results  of  using  a  variable  learning  rate.  The  total  error  versus  epoch  number 
shows  training  using  a)  a  fixed  learning  rate,  b)  a  variable  learning  rate,  and  c) 
a  lower  fixed  learning  rate. 
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large  learinig  rate  initially,  followed  by  a  successively  decreasing  learning  rate  converges 
rapidly  when  possible,  and  slowly  when  necessary.  Therefore,  the  variable  learning  rate 
modification  has  shown  to  be  successful  in  increasing  the  network  convergence  to  an 
overall  lower  total  error. 

4.2  Exclusive  OR 

The  first  problem  used  to  test  the  recurrent  networks  ability  to  learn  was  the  classic 
XOR  problem.  The  recurrent  network  was  initially  trained  on  the  binarized  data  set 
described  in  the  Chapter  HI.  The  learning  accuracy  for  this  binarized  training  data  set 
was  100%  after  20  epochs,  with  a  total  squared  error  on  the  final  epoch  of  0.032.  The 
decision  threshold  for  correctness  was  0.5;  if  the  output  was  greater  than  0.5,  the  output 
was  considered  a  1,  and  if  the  output  was  less  than  0.5,  the  output  was  considered  a  0. 
However,  this  accuracy  was  meaningless  unless  the  networks  generalization  ability  was 
tested  on  a  separate  data  set. 

For  this  test,  the  weights  saved  from  the  training  run  were  used  to  test  a  separate 
binarized  data  set  generated  using  a  different  randomization  seed.  During  the  test,  the 
weights  remained  fixed,  and  the  network  processes  the  test  data  only  once.  In  this  test  run, 
there  were  zero  prediction  errors.  Thus,  the  generalization  accuracy  of  the  network  was 
100%  when  tested  on  the  separate  binarized  test  set.  In  fact,  there  were  zero  errors  for  four 
other  binarized  test  sets,  all  of  which  were  generated  from  separate  seeds. 

However,  a  separate  test  was  performed  to  see  how  the  network  learns  when  trained 
on  an  analog  data  set.  After  300  training  epochs  (512  input  vectors)  on  the  analog  data 
set,  the  training  accuracy  was  98.1%  correct,  based  on  a  decision  threshold  equal  to  0.5. 
The  network  weights  were  saved  and  used  to  test  a  1024  vector-length  test  set  containing 
randomly  generated  analog  values.  After  testing,  the  prediction  accuracy  was  93.1% 
correct.  Then  two  binarized  data  sets  were  tested  using  the  weights  computed  from  the 
training  run  on  the  analog  XOR  data  set.  The  results  of  the  binarized  data  tests  were  as 
follows:  for  the  first  binarized  data  set,  the  testing  accuracy  was  91.3%  correct,  and  for 
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the  second  binarized  data  set,  the  testing  accuracy  was  90.8%  correct. 

These  results  were  not  really  expected.  It  was  expected  that  the  network  trained 
on  the  analog  XOR  data  set  would  be  able  to  exactly  learn  the  comer  values  (vertices). 
However,  the  above  results  show  that  almost  10%  of  the  data  points  in  the  binarized  test 
set  were  incorrectly  classified.  This  implies  that  the  network  did  not  leam  the  true  spatial 
XOR  problem.  Rather,  it  learned  the  spatio-temporal  XOR  problem.  In  other  words,  the 
network  learned  the  spatial  XOR  problem  as  presented  in  a  temporal  sequence. 

To  further  demonstrate  that  the  network  did  not  leam  the  trae  XOR  problem.  Figure 
8  contains  a  spatial  distribution  plot  of  the  analog  XOR  data  set  which  identifies  the  points 
which  were  classified  correctly  or  incorrectly  following  testing.  The  diamonds  represent 
the  points  within  XOR  space  which  the  network  incorrectly  classified.  Notice  how  evenly 
distributed  the  misclassified  points  are  throughout  the  XOR  regions.  There  were  bad 
decisions  made  in  every  region,  even  points  very  close  to  the  vertices  (comers).  If  the 
recurrent  network  tmly  learned  the  spatial  XOR  problem,  the  bad  decisions  would  be 
expected  to  lie  close  to  the  intersecting  lines  (cross-hairs)  between  the  respective  XOR 
regions. 

Two  more  analog  XOR  data  sets  were  created  to  see  if  the  spatial  decision  regions 
change  as  the  temporal  presentation  of  the  analog  XOR  data  changes.  These  new  analog 
XOR  sets  are  identical  except  that  the  input  sequence  of  XOR  data  is  changed.  The 
first  set  will  be  referred  to  as  test  case  1,  and  the  other  as  test  case  2.  Using  the  same 
weights  generated  from  training  on  the  previously  mentioned  analog  training  set,  the 
testing  accuracy  for  test  case  1  was  90.1%  correct,  and  for  test  case  2  was  90.5%  correct. 
Figure  9a)  displays  the  spatial  dicisions  for  this  test  case  1  and  Figure  9b)  displays  the 
spatial  dicisions  for  this  test  case  2.  This  time,  notice  the  difference  in  spatial  location  of 
the  misclassified  points  in  Figure  9a)to  the  spatial  location  of  the  misclassified  points  in 
Figure  9b).  Since  the  temporal  presentation  of  the  data  set  changed,  the  decisions  made 
by  the  network  changed  also. 

Therefore,  it  is  suggested  that  the  actual  decision  region  the  recurrent  network  uses 
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Figure  8.  XOR  spatial  distribution  decision  plot.  The  diamonds  represent  the  points  in 
XOR  space  that  the  network  incorrectly  classified.  There  were  bad  decisions 
made  in  every  region,  even  for  points  very  close  to  the  vertices  (comers).  No 
clear  spatial  decision  region  can  be  found.  All  of  the  incorrect  decisions  were 
expected  to  be  located  near  the  cross-hair  lines. 
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Figure  9.  (a)  XOR  spatial  distribution  decision  plot  for  test  case  1.  (b)  XOR  spatial 
distribution  decision  plot  for  test  case  2.  The  diamonds  represent  the  points  in 
XOR  space  that  the  network  incorrectly  classified. 
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is  described  by  a  nonlinear  spatio-temporal  mapping.  Previous  research  suggests  that  the 
recurrent  network  learns  the  XOR  problem  by  organizing  itself  into  an  appropriately  lay¬ 
ered  feedforward  network  (22:96-97).  However,  if  this  were  truly  the  case,  the  recurrent 
network  would  simply  create  a  clearly  discemable  spatial  decision  region  as  was  previ¬ 
ously  expected.  But  as  the  previous  results  have  shown,  no  clear  spatial  decision  region 
was  formed.  The  recurrent  network’s  XOR  decision  was  made  based  on  the  temporal 
presentation  of  the  spatial  information  contained  in  the  data  set. 

4.3  Internal  State 

The  network’s  ability  to  represent  internal  state  is  outlined  in  the  following  plots.  In 
Figure  10,  the  output  of  the  network  is  compared  to  the  desired  training  output  in  time. 
As  outlined  in  Chapter  III,  the  training  data  set  consisted  of  a  randomly  generated  input 
line  (a,  6,  c,  or  d),  one  of  which  set  equal  to  1  and  the  rest  set  equal  to  0.  The  desired 
output  is  1  on  the  time  step  immediately  following  the  first  occurrence  of  a  6  following  a, 
and  0  otherwise.  Figure  10  shows  how  the  trained  network  was  able  to  predict  the  desired 
output  after  only  20  epochs  through  a  95  vector  data  set.  These  weights  were  saved  and 
used  to  test  the  network’s  generalization  ability  on  a  separate  data  set. 

Figure  1 1  details  how  well  the  network  can  generalize  from  the  training  data  set  to 
an  application  on  a  specific  test  data  set.  These  results  show  a  100%  accuracy  in  predicting 
the  occurrence  of  a  state  transition  of  as  much  as  29  time  steps  apart.  Reference  Tables 
1  and  2  in  Chapter  III  for  the  occurrence  of  ah  pair  and  the  respective  time  separations. 
These  results  imply  a  specific  capacity  to  remember.  Yet  these  results  also  imply  that  the 
network  will  identify  the  first  occurrence  of  a  6  line  transition  no  matter  how  long  ago  the 
first  a  transition  occurred. 

In  analogy,  the  network  configures  to  be  a  set-reset  state  device  (flip-flop).  Reference 
Table  3  for  the  following  discussion.  The  occurrence  of  an  a  sets  the  state  of  the  hidden 
unit  high  v\  hile  the  occurrence  of  a  6  during  a  low  state  resets  the  state  of  the  hidden  unit  to 
low.  If  the  previous  state  was  high,  the  output  will  be  high.  But  if  the  previous  state  was 


35 


OUTPUT 


1.4 

1.2 

1 

0.8 

0.6 


0.4 
0.2 
0 

0  20  40  60  80  100 

TIME  (arbitrary  units) 

Figure  1 0.  Desired  and  actual  output  of  the  recurrent  network  after  training.  The  decision 
threshold  for  correctness  was  0.5;  if  the  output  was  greater  than  0.5,  the  output 
was  a  1,  and  if  the  output  was  less  than  0.5,  the  output  was  0.  The  separation 
of  outputs  greater  than  0.5  represents  the  separation  time  between  when  an  a 
occurred  and  the  first  h  occurred.  Training  accuracy  was  100%. 
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Figure  11.  Internal  state  test  output  of  the  recurrent  network  after  training  20  epochs. 

The  decision  threshold  for  correctness  was  0.5.  The  separation  of  outputs 
greater  than  0.5  represents  the  separation  time  between  when  an  a  occurred 
and  the  first  b  occurred.  Testing  accuracy  was  100%. 
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low  and  a  b  occurs,  the  output  will  remain  low  because  the  hidden  unit  has  not  been  set 
low.  When  the  previous  output  was  high,  the  recurrent  weights  for  node  one  will  always 
reduce  this  high  to  a  low,  thus  resetting  the  output  until  the  next  state  transition  occurs. 


Table  3.  Training  weights  for  the  internal  state  problem.  The  first  row  are  weights  for 
the  output  node  and  the  second  row  are  weights  for  the  hidden  node. 


bias  wt 

0  wt 

6  wt 

cwt 

d  wt 

recur  wt 1 

recur wt 2 

output  node 
hidden  node 

-6.636 

-1.565 

-1.656 

4.242 

3.749 

-4.055 

H 

-2.877 

-3.461 

4.4  Second-Order  HR  Lowpass  Filter  Simulation 

As  described  in  Chapter  III,  the  recurrent  network  was  trained  to  simulate  second- 
order  HR  lowpass  filter  (Butterworth).  The  following  input  signals  were  used  to  test  the 
network’s  ability  to  accurately  simulate  the  filter’s  response:  a  unit  step,  a  cosine  wave, 
an  inverted  sine  wave,  and  a  pseudo-random  number  sequence  (to  simulate  white  noise). 
Figure  12  shows  the  desired  frequency  response  of  the  Butterworth  filter. 

4.4.1  Impulse  Response  The  network  was  trained  on  the  impulse  response  of 
the  filter.  This  was  accomplished  by  generating  a  data  set  using  the  difference  equation 
displayed  in  Eq  13.  The  input  to  the  generator  was  an  impulse  d{t),  where  d{t)  equals  1  for 
< = 0  and  b{t)  equals  0  otherwise.  The  output  of  the  generator  was  used  as  the  desired  output 
of  the  network.  The  desired  ouipui  of  the  network  was  simply  the  output  of  the  generator 
delayed  by  one  time  step.  In  theory,  the  impulse  response  of  a  linear  system  completely 
describes  the  system.  Therefore,  by  training  the  network  on  the  impulse  response  of  the 
system,  the  network  should  be  expected  to  accurately  simulate  the  response  of  a  linear 
system  to  any  other  input. 

Figure  13  contains  the  results  of  the  network  after  training  for  600  epochs  on  the 
desired  impulse  response  of  the  filter.  The  output  of  the  network  was  processed  through  a 
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Butterwolth  Filter  Frequency  Response 
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Figure  12.  Desired  frequency  response  of  the  Butterwoith  filter. 

fast  Fourier  transform  (FFT)  algorithm  and  the  results  plotted  in  Figure  14.  The  weights 
generated  by  the  network  after  this  training  run  were  saved  and  used  to  test  the  network’s 
response  to  other  input  signals. 

Based  on  the  results  in  Figure  14,  the  recurrent  network  could  not  completely 
memorize  the  impulse  response  of  the  filter,  and  thus,  a  complete  filtering  of  higher 
frequency  components  could  not  be  learned.  This  is  shown  by  the  non-zero  response  in  the 
region  where  no  frequency  components  should  be.  However,  the  amplitude  of  the  higher 
order  frequency  components  will  still  be  greatly  attenuated.  This  does  not  imply  that  the 
network  did  not  learn  to  generalize  the  response  of  the  Butterwoith  filter.  The  real  test  is 
to  apply  the  weights  generated  from  this  training  to  other  input  signals  and  compare  the 
results  to  the  expected  filter  response. 

4.4.2  Unit  Step  Response  Using  the  weights  generated  by  the  network  after  it  was 
trained  on  the  Butterwoith  filter’s  impulse  response,  the  network  was  tested  using  a  unit 
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Figure  13.  Comparison  of  the  network's  impulse  response  to  that  of  the  desired  impulse 
response  of  the  Butterwoith  filter  after  600  training  epochs.  The  impulse 
response  completely  characterizes  the  filter. 
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Figure  14.  Comparison  of  the  network’s  impulse  frequency  spectrum  to  that  of  the  desired 
frequency  response  of  the  Butterwoith  filter  after  600  training  epochs.  Since 
the  recurrent  netwoilc  could  not  completely  memorize  the  impulse  response 
of  the  filter,  a  complete  filtering  of  higher  frequency  components  could  not  be 
learned.  However,  the  amplitude  of  the  higher  order  frequency  components 
will  be  greatly  attenuated. 
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step  as  the  input  signal.  A  unit  step  is  defined  as  equal  to  1  for  t  >  0  and  equal  to  0 
otherwise.  The  network  response  to  the  step  input  is  shown  in  Figure  15,  and  Figure  16 
shows  the  frequency  domain  representation  to  the  same  step  input. 

The  results  plotted  in  Figure  15  show  how  well  the  recurrent  network’s  response 
matched  the  expected  response  of  the  Butterworth  filter.  The  one  big  difference  is  the 
lack  of  an  overshoot  in  the  network’s  response.  This  is  a  feature  common  to  a  heavily 
overdamped  system,  where  as  the  Butterworth  filter’s  response  only  shows  slight  damping. 
Nevertheless,  the  network  still  showed  an  excellent  ability  to  simulate  the  step  response 
of  the  filter.  In  the  frequency  domain  results  shown  in  Figure  16,  small  differences  can 
be  noted  throughout  the  plot.  However,  since  the  plot  is  log-linear,  these  differences  are 
amplified. 

4.4.3  Sinusoidal  Response  The  network’s  response  was  further  tested  by  using 
two  different  sinusoidal  waves  as  inputs  to  the  system.  The  first  sinusoid  was  a  cosine 
wave  that  completes  2  cycles  within  128  sample  points.  The  response  of  the  trained 
network  to  this  cosine  wave  is  shown  in  Figure  17.  Throughout  the  plot  of  the  response, 
the  network  response  very  closely  predicted  the  expected  response  of  the  Butterworth  filter. 

In  the  frequency  domain,  it  is  apparent  that  this  cosine  wave  was  completely  within 
the  passband  of  the  filter.  Figure  1 8  displays  the  actual  network  spectral  response  compared 
to  the  expected  Butterworth  spectral  response.  As  with  the  unit  step  response,  the  cosine 
frequency  response  of  the  network  so  closely  matched  the  cosine  frequency  response  of 
the  filter  that  no  significant  differences  can  be  noted. 

The  second  sinusoid  used  to  test  the  recurrent  network’s  ability  to  simulate  the 
response  of  the  Butterworth  filter  was  an  inverted  sine  wave  that  completes  4  cycles  within 
128  sample  points.  Figure  19  displays  the  response  of  the  trained  network  compared  with 
the  expected  Butterworth  filter  response.  It  illustrates  how  closely  the  network  response 
predicted  the  expected  response  of  the  Butterworth  filter.  Figure  20  simply  shows  how 
well  the  network  learned  the  response  of  the  Butterworth  filter. 
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Figure  15.  Comparison  of  the  recurrent  netwoik’s  response  to  a  unit  step  input  with  the 
Butterworth  filter^s  response  to  a  unit  step  input. 
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Figure  16.  Comparison  of  the  recunent  network’s  spectial  response  to  a  unit  step  input 
with  the  Butterworth  filter’s  spectral  response  to  a  unit  step  input.  The  plot  is 
log-linear. 
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Filter  and  Network  Cosine  Response  Comparison 


Figure  17.  Comparison  of  the  recurrent  network’s  response  to  a  cosine  wave  input  with 
the  Butterworth  filter’s  response  to  a  cosine  wave  input. 


Cosine  Frequency  Response  Comparison 


Figure  18.  Comparison  of  the  recurrent  network’s  spectral  response  to  a  cosine  wave 
input  w’th  the  Butterworth  filter’s  spectral  response  to  a  cosine  wave  input. 
The  plot  is  log-linear. 
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Network  Test  Results  For  Inverted  Sine  Wave  Input 
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Figure  19.  Comparison  of  the  recurrent  network’s  response  to  an  inverted  sine  wave 
input  with  the  Butterworth  filter’s  response  to  an  inverted  sine  wave  input. 
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Figure  20.  Comparison  of  the  recurrent  network’s  frequency  response  to  an  inverted  sine 
wave  input  with  the  Butterworth  filter’s  frequency  response  to  an  inverted 
sine  wave  input. 


4.4.4  Pseudo-Random  Number  Sequence  Response  The  last  test  of  the  recurrent 
network’s  ability  to  simulate  the  response  of  a  Butterworth  filter  was  to  apply  a  broadband, 
noisy  signal  to  the  input  of  the  network.  The  noisy  signal  was  approximated  by  a  pseudo¬ 
random  number  sequence  in  the  range  [-1,1].  Figure  21  shows  the  results  of  the  network’s 
response  to  a  noisy  input  signal  compared  to  the  expected  response  of  the  Butterworth 
filter  to  the  same  noisy  signal.  Although  the  network  response  does  not  exactly  follow 
the  expected  response,  it  does  follow  the  expected  response  close  enough  to  say  that  the 
network  has  indeed  learned  to  simulate  the  response  of  the  Butterworth  filter  for  a  noisy 
input  signal. 

In  the  frequency  domain  plot  displayed  in  Figure  22,  it  is  clear  to  see  how  the 
frequency  components  in  the  cutoff  region  of  the  filter  have  been  attenuated  when  compared 
to  the  amplitude  of  the  frequency  components  in  the  filter’s  passband.  As  identified  earlier 
in  the  training  of  the  impulse  response,  the  recurrent  network  did  not  completely  memorize 
the  impulse  response  of  the  Butterworth  filter.  Thus,  those  frequency  components  falling 
outside  the  filter’s  cutoff  region  will  only  be  attenuated  and  not  completely  cutoff. 

4.5  Predicting  3-D  Head  Position  in  Time 

The  recurrent  network  configuration  consisted  of  1  input  (the  current  head  position 
at  time  t),  1  sigmoidal  output  (the  predictive  head  position  at  time  t  -f  r,  where  r  is  some 
arbitrary  time  based  on  the  sampling  rate  of  the  system),  and  1  sigmoidal  hidden  unit.  The 
desired  output  in  the  data  set  was  offset  by  r  =  2  time  steps.  This  means  that  for  a  given 
input  position,  the  desired  output  position  is  the  actual  position  displaced  2  time  steps  in 
the  future.  There  were  8997  position  samples  in  the  data  set.  The  network  was  trained 
on  the  first  1000  data  points  for  400  epochs  with  an  initial  learning  rate  of  3.0.  Figure 
23  illustrates  how  close  the  recurrent  network  predicted  the  head’s  y-position  for  r  =  2. 
Only  the  y-position  was  displayed  because  the  x-  and  z-position  plots  were  both  equally 
as  accurate  as  the  y-position  plot. 

Notice  how  the  network  prediction  silghtly  lags  behind  the  actual  y-position.  This 
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Network  Test  Results  For  a  Random  Input  (Noise) 


TIME  (arbitrary) 

Figure  21.  The  results  of  the  recurrent  network’s  response  to  a  noisy  input  signal  com¬ 
pared  to  the  expected  response  of  the  Butterworth  filter  to  the  same  noisy 
signal. 
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Network  Test  Results  For  a  Random  Input  (Noise) 
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Figure  22.  The  results  of  the  recurrent  network’s  frequency  response  to  a  noisy  input 
signal  compared  to  the  expected  spectral  response  of  the  Butterworth  filter  to 
the  same  noisy  signal. 
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Y-POSmON 


Network  Training  Results  For  Head  Y-Position 


TIME  (arbitrary) 

Figure  23.  Predicting  head  position  training  results.  The  recurrent  network  trained  for 
400  epochs  on  1000  data  points.  Only  a  portion  of  the  results  are  displayed. 
The  network  output  was  trained  to  predict  the  value  of  the  input  function  two 
time  steps  in  the  future.  This  plot  is  a  comparison  of  the  network  output 
y{t  +  2)  to  the  actual  y-position  at  time  i  +  2. 
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indicates  that  the  network  did  not  learn  to  accurately  predict  the  pilot’s  head  position  in 
time.  Portions  of  the  networks  prediction  are  very  close  to  the  actual  values,  but  this  is  not 
consistant  throughout  the  data  set. 


Comparison  of  Network  Error  to  the  Statistical  Predictior  Error 


Figure  24.  This  plot  is  a  comparison  of  the  recurrent  network’s  error  e{t)  to  the  statistical 
prediction  error. 

So,  how  well  does  the  network  compare  to  the  best  linear  prediction?  A  statistical 
prediction  algorithm  was  used  for  this  comparison.  The  same  data  set  used  to  train  the 
network  was  used  in  the  statistical  prediction  algorithm.  The  results  of  this  comparison 
(shown  in  Figure  24)  show  that  the  network  only  slightly  outperforms  the  statistical 
predictor.  The  total  mean  squared  error  of  the  network  was  0.000174  while  the  total 
mean  squared  error  of  the  statistical  predictor  was  0.000220.  However,  in  terms  of 
prediction,  portions  of  the  network’s  output  were  very  close  to  the  actual  signal  whereas 
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the  statistical  prediction  consistantly  lagged  behind  the  actual  signal.  On  this  point,  the 
network  performance  was  better. 

Thus,  from  these  results,  the  recurrent  network  shows  to  be  a  more  robust  function 
predictor  than  the  best  linear  predictor.  This  is  true  for  three  reasons.  One,  the  network 
does  not  require  that  the  entire  temporal  sequence  be  known  while  the  statistical  predictor 
does.  Two,  the  network  does  accurately  predict  portions  of  the  pilot’s  head  position  in  time 
while  the  statistical  predictor  always  lags.  Three,  the  network  can  be  trained  in  real-time 
and  updated  as  necessary  to  accommodate  unexpected  future  events  whereas  the  statistical 
predictor  can  not. 

4.6  Voice  Data  Reconstruction 

For  the  task  of  voice  data  reconstruction,  the  recurrent  network  was  required  to  learn 
the  difference  between  a  fricative  (noisy)  and  a  voiced  (non-noisy)  portion  of  speech.  The 
recurrent  network  configuration  consisted  of  4  inputs,  1  sigmoidal  output,  and  no  hidden 
units.  The  network  was  trained  for  400  epochs  on  a  1500  vector-length  data  set.  The 
learning  rate  was  initialized  to  5.0.  The  data  set  for  this  application  consisted  of  four 
features  computed  from  a  variable  width  sample  of  speech.  The  first  feature  was  the  total 
energy  contained  in  the  sample.  The  second  feature  was  the  number  of  zero-crossings  that 
occurred  during  the  sample  period  divided  by  the  sample  window  length.  The  third  feature 
was  the  number  of  slope  changes  divided  by  the  sample  window  length.  And  the  fourth 
feature  was  the  total  energy  below  500  hertz  within  the  sample  window.  The  desired  output 
was  a  classification  based  upon  whether  the  input  features  described  a  fricative  (class  1) 
or  voiced  (class  0)  portion  of  speech.  The  classification  was  assessed  purely  on  human 
discretion. 

Figure  25  displays  the  results  of  the  recurrent  network  after  training  for  400  epochs 
through  the  1500  vector-length  data  set.  The  decision  accuracy  of  98.4%  was  based  upon 
whether  the  network  output  was  greater  than  0.5  for  a  class  1  or  less  than  0,5  for  a  class  0, 
The  few  classification  errors  the  network  made  were  for  noisy  regions  that  contain  higher 


49 


than  normal  energy  (such  as  the  sound  for  the  letter  "k"). 
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Figure  25.  Voice  data  classification  results  after  training  on  400  epochs.  The  few  clas¬ 
sification  errors  the  network  made  are  for  noisy  regions  that  contain  higher 
than  normal  energy  (such  as  the  sound  for  the  letter  "k"). 


Following  training,  the  network  weights  were  saved  and  used  in  the  reconstruction 
program.  Within  the  reconstruction  program,  the  weights  were  used  with  the  transmitted 
signal  to  reconstruct  the  speech  pattern,  adding  noise  where  needed  based  upon  the  net¬ 
work’s  classification  decision.  Figure  26  illustrates  how  the  network  weights  were  used  in 
the  reconstruction  program  to  classify  a  given  speech  pattern.  The  fricative  regions  classi¬ 
fied  as  "1"  were  noisy  regions  where  noise  was  added  to  the  signal  during  reconstruction. 
In  the  voiced  "0"  regions,  no  noise  was  added  to  the  signal  during  reconstruction.  Using 
this  network,  the  reconstruction  program  was  able  to  reproduce  an  intelligible  voice  signal 
whereas  the  decisions  made  by  the  feedforward  network  previously  used  could  not. 
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AMPLITUDE 


Recurrent  Network  Decisions  Used  in  Voice  Reconstruction  Program 


TIME(125us) 

Figure  26.  Recurrent  network  decisions  made  within  the  reconstruction  program.  Areas 
classified  as  T  are  considered  fricatives  (noisy  regions),  and  noise  was  added 
to  that  portion  of  speech  during  reconstruction.  No  noise  was  added  to  the 
voiced  regions  classified  a  "0". 
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4.7  Summary 

The  recurrent  neural  network  was  tested  using  several  temporally  encoded  data 
sets.  From  these  test  results,  the  network  demonstrated  the  ability  to  learn  the  internal 
state  problem  and  the  second  order  UR  lowpass  Butterworth  filter  problem.  Specifically 
noted  was  the  network’s  ability  to  learn  both  the  temporal  response  and  frequency  domain 
response  of  the  Butterworth  filter  by  training  only  on  the  filter’s  impulse  response. 

The  recurrent  network  was  also  tested  on  the  classic  XOR  problem.  However,  it 
was  discovered  that  the  recurrent  network  did  not  learn  this  problem  in  the  classic  spatial 
sense.  Rather  it  learns  the  problem  in  a  spatio-temporal  decision  space.  There  was  found 
no  clear  decision  region  which  could  be  used  to  delineate  the  correct  decisions  from  the 
incorrect  decisions,  as  shown  by  Figure  9. 

Following  testing,  the  recurrent  network  was  applied  to  two  problems:  head  position 
tracking,  and  voice  data  reconstruction.  The  accuracy  at  which  the  network  predicted  the 
pilot’s  head  position  showed  the  recurrent  network’s  ability  to  predict  trajectories  and 
motion  as  well  as,  or  slightly  better  than,  the  best  linear  predictor.  The  application  of  the 
network  to  the  reconstruction  of  voice  data  showed  the  network’s  ability  to  make  accurate 
decisions  based  upon  the  learning  of  temporally  encoded  sequences.  Thus,  through  both 
of  these  applications,  the  recurrent  network  displayed  a  high  degree  of  generalization. 
Therefore,  the  extension  of  the  recurrent  neural  network’s  application  to  a  wide  range  of 
differing  problems  would  be  a  straight-forward  process. 
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V.  Conclusions  and  Recommendations 


This  thesis  effort  has  sought  to  encode  the  RTRL  algorithm,  test  it,  and  use  it  to 
predict  the  future  value  of  a  function  based  upon  the  function’s  history.  This  process, 
called  function  prediction,  is  extremely  important  to  many  Air  Force  applications. 

5.1  Conclusions 

The  RTRL  algorithm  has  demonstrated  the  ability  to  leam  several  time  dependent 
functions.  From  the  test  results  outlined  in  Chapter  IV,  the  network  demonstrated  the 
ability  to  leam  the  internal  state  problem,  and  the  second  order  HR  lowpass  Butterworth 
filter  problem.  Also  the  recurrent  network  demonstrated  the  ability  to  temporally  leam  the 
classic  XOR  problem. 

However,  in  an  exact  sense,  it  could  not  leam  the  true  spatial  mapping  of  the  XOR 
problem  because  of  the  temporal  information  contained  in  the  sequential  data.  This  was 
evidenced  by  the  fact  that  no  clear  spatial  decision  region  could  be  used  to  delineate  the 
correct  decisions  from  the  incorrect  decisions,  as  Figure  9  illustrates. 

The  recurrent  network  was  also  applied  to  two  problems;  head  position  tracking, 
and  voice  data  reconstmction.  The  accuracy  at  which  the  network  predicted  the  pilot’s 
head  positions  showed  the  recurrent  network’s  ability  to  predict  trajectories  and  motion. 
The  application  of  the  network  to  the  reconstmction  of  voice  data  showed  the  network’s 
ability  to  leam  temporally  encoded  sequences. 

The  recurrent  network  has  demonstrated  a  high  degree  of  accuracy  as  a  function 
prediction  tool.  In  the  Butterworth  filter  application,  the  network  not  only  simulated  the 
response  of  the  filter  to  various  input  signals,  it  did  so  predictively.  In  other  words,  the 
output  of  the  network  was  a  prediction  of  the  output  of  the  filter  for  the  next  time  step. 
This  prediction  was  based  upon  the  current  signal  activation  and  the  previous  network 
response.  For  the  head  position  tracking  problem,  the  recurrent  network  demonstrated  a 
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high  degree  of  accuracy  in  predicting  spatial  head  position  at  either  2  or  5  time  steps  in  the 
future.  The  time  step  was  arbitrary  and  was  based  on  the  sampling  rate  of  the  data  being 
analyzed. 

5.2  Recommendations 

A  recommendation  that  may  improve  the  rate  of  convergence  of  the  network  entails 
the  use  of  a  network  reduction  scheme.  One  such  method  makes  use  of  the  Ruck  saliency 
metric  (16, 17).  This  method  examines  the  responsiveness  of  the  network’s  output  to  its 
input  in  order  to  rank  the  network’s  nodal  usefulness.  This  way,  the  network  can  be  pruned 
down,  reducing  the  number  of  interconnection  weights  and  processing  nodes.  A  direct 
result  of  this  reduction  would  be  a  great  increase  in  computational  speed  and  network 
convergence. 

Another  recommendation  would  be  to  investigate  a  way  to  determine  what  the 
recurrent  neural  network  learned  during  the  training  process.  It  is  known  how  the  network 
learns  and  how  it  performs  when  trained,  but  it  is  still  not  exactly  known  what  the  network 
learns  in  order  to  make  accurate  decisions. 

5.3  Future  Research 

Much  more  research  is  needed  in  the  area  of  recurrent  networks.  Two  specific 
topics  are  brought  to  light  by  this  thesis.  One  is  the  concept  of  the  recurrent  network’s 
capacity  to  remember.  Does  the  network  really  remember  the  temporal  nature  of  the  task 
it  is  presented?  Does  it  forget  at  some  future  time?  How  does  the  logistic  squashing 
function  affect  the  network’s  capacity  to  remember?  The  second  topic  of  future  research 
is  the  spatio-temporal  mapping  of  the  recurrent  network’s  decision  region.  What  kind  of 
decision  region  does  the  recurrent  network  create  in  the  process  of  making  a  decision?  Do 
tiie  internal  state  variables  play  an  important  role  in  the  decision  process?  Is  this  decision 
region  purely  a  temporal  mapping,  or  does  this  mapping  contain  both  spatial  and  temporal 
information? 
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Appendix  A.  Software  Development 


Appendix  B  contains  the  source  listing  for  the  modified  RTRL  algorithm  developed 
at  AFIT  called  "RECNET"  (short  for  recurrent  neural  network).  RECNET  was  written  in 
"ANSI  C"  and  has  been  successfully  compiled  and  run  on  all  of  the  following  computer 
systems:  Silicon  Graphics  4D/GTX,  Silicon  Graphics  Personal  IRIS  4D,  NeXT  NeXTsta- 
tion,  and  IBM/compatable  "AT  class"  personal  computers  using  Turbo  C++.  The  main 
program  file  is  named  "recnet.c". 

A.1  File  Parameters 

RECNET  requires  two  data  files,  called  "parameters.dat"  and  "data.dat"  (default). 
"parameters.dat"  is  a  data  file  which  contains  the  following  three  numbers: 

num_epochs  learning_rate  randoin_nuinber_seed 

The  "num.epochs"  (integer)  is  the  epoch  length  for  a  specific  training  run.  The  "learn¬ 
ing-rate"  (float)  is  the  learning  rate  of  the  network.  If  the  output  nodes  of  the  network  are 
defined  as  sigmoidal,  the  learning  rate  can  be  set  to  any  value  that  worics.  If  the  output 
nodes  are  defined  as  linear,  the  learning  rate  must  be  small  (alpha  <  0.5)  for  the  networic  to 
remain  stable.  The  "random_number_seed"  (integer)  is  the  seed  used  to  randomly  generate 
the  weight  matiix.  The  data  file  "data.dat"  is  the  default  name  for  the  data  file  to  be  read 
during  training  or  testing.  If  a  data  filename  is  passed  to  RECNET  at  the  command  line, 
RECNET  will  read  that  filename  as  the  input  data  file.  For  example,  the  command 

recnet  inydatafile.dat 

will  execute  RECNET  using  "mydatafiie.dat"  as  the  current  input  data  file.  To  test  a  data 
file,  the  command 

recnet  inytestfile.dat  test 
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will  execute  RECNET  using  "mytestfile.dat"  as  the  input  test  data  file.  The  format  of  the 
data  file  is  as  follows: 

nxaminputs  numoutputs  nxunnodes  nvunvectors 

ini  in2 . . .numinputs  desl  des2 .. .numoutputs  (vectorl) 

{vector2) 

.  .  .  .  .  .  (vectors) 

.  .  .  .  .  .  (vector4) 

(vectors) 

(vectors) 

numvectors 

where  "numinputs"  (integer)  is  the  number  of  external  inputs,  "numoutputs"  (integer) 
is  the  number  of  external  outputs,  "numnodes"  (integer)  is  total  number  of  processing 
nodes  (which  includes  the  output  nodes),  "numvectors”  (integer)  is  the  total  number  of 
input/desired-Output  vectors,  "ini  in2...”  (float)  are  the  actual  values  to  be  read  as  inputs 
(there  should  be  ’numinputs’  of  these),  and  "desl  des2..."  (float)  are  the  actual  desired 
output  values  (there  should  be  ’numoutputs’  of  these).  Each  vector  is  considered  a  seperate 
timed  event. 

A.2  Environment 

RECNET  dynamically  configures  itself  using  the  data  contained  in  the  data  file 
header  line.  During  initialization,  memory  space  is  allocated  for  all  variables  and  all 
inputs  and  desired  outputs  are  read  in  before  any  computation  begins.  After  initialization, 
the  network  begins  training  on  the  data,  dynamically  adjusting  the  weights  until  either  the 
total  error  drops  below  O.OftOS,  or  the  total  number  of  epochs  have  been  reached. 

RECNET  will  display  different  information  depending  on  which  mode  of  operation  is 
selected.  During  training,  RECNET  will  output  to  the  terminal  screen  various  information. 
First,  it  will  show  how  it  is  configured  by  displaying  the  data  file  header  line.  Following 
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this,  the  epochwise  total  error  is  printed  to  show  how  the  network  is  learning,  or  whether 
or  not  it  is  learning.  During  testing,  RECNET  outputs  the  current  configuration  to  the 
terminal  screen.  In  addition,  it  displays  the  names  of  the  three  data  files  it  creates  during 
the  test  run.  These  files  are  described  in  the  next  section. 

A.3  0  *iput 

After  network  training  is  complete,  several  output  data  files  are  created.  First,  the 
data  file  "weights.dat"  is  created.  It  contains  a  )  vector  listing  for  the  very  last  timed  input 
vector  and  a  listing  of  the  complete  weight  matrix,  in  row-column  format,  after  training. 
In  addition,  the  files  "desired.dat"  and  "netout.dat"  are  created,  The  file  "desired.dat" 
contains  a  listing  of  the  desired  output  values,  and  the  file  "netout.dat"  contains  a  listing 
of  the  actual  network  output  values  corresponding  to  the  appropriate  desired  output  value. 
These  two  files  are  separate  to  aid  in  plotting  the  data. 

After  testing,  RECNET  creates  three  data  files.  They  are  described  as  follows: 
"testcheck.dat"  contains  a  comparative  listing  of  the  computed  network  output  and  the 
desired  network  output,  "testdes.dat"  contains  a  listing  of  the  desired  network  output, 
and  "testout.dat"  contains  a  listing  of  the  computed  netork  output.  Again,  these  files  are 
seperate  to  aid  in  plotting  the  data. 
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Appendix  B.  Recurrent  Neural  Network  Source  Code 


This  appendix  contains  a  listing  of  the  modified  real-time  recurrent  learning  algo¬ 
rithm  source  code  and  its  supporting  functions.  The  files  "nrutil.c"  and  "ranl.c"  were  used 
from  the  Numerical  Recipies  in  C  book  (12). 

h  definitions.h  **^=if******************^f****************************** 

File  containing  function  declarations  and  variable 
declarations  for  the  main  program  called  recnetc. 

date:  30  May  91 

written  by:  Randall  L.  Lindsey 

9(C  jfc  Dfc  ^  %  :4c  :|c  3|c  3t(  :{c  :|c  :fc  sfc  :(c  ^  )|e  :{c  3(C  %  ifc  9|C  :4c  }|C  :(o(c  3(c  )|c  }|e  V 

float  *vector(); 
float  >K*matrix(); 
float  *>i‘*matrix3d(); 
float  ranlO; 

FILE  *ifp,  *ofp; 
int  run=l; 

char  str[80],  *datafile; 

int  nrows,  ncols,  i,  j,  k,  1,  m,  n; 

int  epochs,  a,  t; 

int  numJnputs,  num-outputs,  num_nodes,  num.vectors; 

int  seed,  idum=l; 

float  alpha,  J[2],  sum,  kron; 

float  *y,  *s,  *e,  *yprime; 

float  **z,  t+d,  ++W,  **delw; 

float  ***p,  ***p_old,  ***p-temp; 

float  sigmoid(float  x); 

void  init_net(); 

void  train_net(); 

void  testJietO; 

void  read.dataO; 

void  propagateQ; 

void  compute_output(); 

void  compute_error(); 

void  updateO; 

void  reset.delw-s(); 

void  reset-p(); 

void  save.weightsO; 

void  read_weights(); 
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void  check_file(); 

/t*J|t**Jtt******i|t*:)c**!(:***>|C’l'*5(!**!(!********!(:*i|'*i|tj|‘*>('**=l'*****=(!*=t!*******’('>l'=l/ 


/**+  MACROS. H  >IC^*3tl*lll)t:4:**^*tll!*t******!t‘*^***’t<t*^**************^ 
char  junk_response[256] ; 

#define  fskip  Jine(A)  fgets(junk_response,  256,  A) 

#def!ne  skipJine  gets(junk-response) 

#define  rloopi(A)  for(i=(A)-l;i>0;i — ) 

#define  rloopj(A)  for(j=(A)-ly>0y — ) 

#define  rloopk(A)  for(k=(A)-l;k>0;k — ) 

#define  rloopl(A)  for(l=(A)-l;l>0;l — ) 

#define  rIoopij(A,B)  fbr(i=(A)-l;i>0;i — )  for(j=(B)-ly>0;j — ) 

#define  loopi(A)  for(i=0;i<A:i++) 

#define  loopj(A)  for(j=0y<Ay++) 

#define  loopk(A)  for(k=0;k<A;k++) 

#define  loopl(A)  for(I=0;l<A;l++) 

#define  loopij(A,B)  fbr(i=0;i<A;i++)  for(j=0y<By++) 

#define  CREATE  JILE(A,B,C)  if((A=fopen(B,"w''))  ==  NULL)  {  \ 

printf(strcat(C,'' :  can't  open  for  writing  -  %s.\n"),B);\ 
exit  (-1); } 

#define  OPEN_FILE(A,B,C)  if((A=fopen(B, "  r "))  =  NULL)  {  \ 

printf(strcat(C, " ;  can't  open  for  reading  -  %s.\n"),B);\ 
exit(-l); } 

#define  lABS(A)  ((int)((-(A)<(A))?((A)):(-(A)))) 


/%  i(c  :(c>|c  :)c  ^ ’ll  ^  lit  ^  it  ^  4=  4=  ^  4=  4:  % 


/t  RECNET.C  ^*^it^*:t::i:^*^^^:^*t**^************>ff*********!^^^***^*****^*** 

A  recurrent  neural  network  which  follows  the  algorithm 
proposed  by  Williams  and  Zipser  in  their  paper  "A  Learning 
Algorithm  for  Continually  Running  Fully  Recurrent 
Neural  Networks",  Neural  Computation  1, 270-280  (1989). 

date:  30  May  91 
update:  15Jul91 
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written  by:  Randall  L.  Lindsey,  GE0-91D 

j(<  *  ))t  j(e  **  i(t  ***♦  )(t  )|(  !|(  >(t  *♦  !|(  j(e  *)(!*  +  *)(!  )(t  ♦***  **  i|e  !|(  i(!  ))t  ♦***♦  !|!  )|(  *  !(e  **  **  !(c  **  !|(  ***  :)i 

#include  <stdio.h> 

#include  "macros .  h" 

#include  <math.h> 

#include  "def  initions  .h" 

#include  <string.h> 

void  inain(int  argc,  char  *argv[]) 

switch  (argc)  { 
case  1; 

datafile= "  da  t  a .  da  t " ;  /*  Default  name  of  datable.  V 

check_lile();  /*  Check  to  see  if  the  datafile  name  exists.  V 
init_net();  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matrix  using  the  pseudo-random  number 
generator.  4 

read-dataO;  /*  Read  data  vector  array  and  desired  output.  4 
train_net();  /♦  Propagate  inputs  and  update  weights  based  on 
gradient  descent.  4 

break; 

case  2; 

datafile=argv[l];  /*  User  specified  name  of  datafile.  4 
check.file();  !*  Check  to  see  if  the  datafile  name  exists.  4 
init_net();  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matnx  using  the  pseudo-random  number 
generator.  4 

read_data();  /*  Read  data  vector  array  and  desired  output.  4 
train_net();  Propagate  inputs,  compute  outputs,  and 
update  weights  based  on  gradient  descent.  4 

break; 
case  3: 

datafile=argv[l];  /*  User  specified  name  of  datafile.  4 

check.fileO;  /*  Check  to  see  if  the  datafile  name  exists.  4 
init_net();  /*  Initialize  and  define  all  network  variables. 

Allocate  memory  for  all  vectors  and  matricies 
and  set  initially  to  zero.  Randomly  set  the 
weight  matrix  using  the  pseudo-random  number 
generator.  4 

tesLnetO;  /*  Propagate  inputs  and  compute  outputs.  4 
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break; 


default: 

printf("  \nUsage:  net  [datafilename.dat]  [testf lag]  \n\n"); 
break; 

} 

}  /*  EndMAINQ  ofRECNET.C  4 


void  train_net()  h  Written  I  OJun  91,  RLL.  4 

{ 

/*  Begin  main  loop  portion  4 

ofp=fopen( '' error  .trn",  "w"); 
for(a=0;a<epochs;a++)  { 

J[0]  =  J[1]; 

J[1]  =  0.; 

for(t=0;t<num-vectors;t++)  { 


propagateQ;  /*  Computes  the  state  of  the  net  at  time  t. 

Store  previous  outputs  y[t-l]  as  part  of 
the  new  input  vector  z[t][i].  Sum  all 
z[][]*w[][]  inputs  into  the  activation 
vector  sit]  for  input  into  y[t].  4 

compute.errorQ;  /*  Computes  the  error  at  time  t. 

How  far  off  are  the  outputs  from  the 

desired  values?  Compute  total  error.  4 

compute_output() ;  /*  Compute  the  output  y(t+l )=f[s(t)  ].  4 

updateQ;  /*  Computes  deLw(t),  and  p(t+l).  Backprop 
error  through  net  and  perform  gradient 
descent  to  calculate  the  delta  weights.  4 

reset_delw_s();  !*  Reset  delta  weights  and  sftj  vectors 
to  zero  for  the  next  iteration.  4 

} 

printf("%s  %  f\n",’' total  error  =",J[1]); /*  Print  total  error. V 
if  ((a  >  5)  &&  (J[0]/J[1]  <  0.95))  { 
alpha  =  alpha/20; 

printf("%  f  %  f  alpha  =  %  f\n",J[0],J[l], alpha); 

} 
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if  (J[l]  <  0.0005)  {  h  If  total  error  is  less  than  a  specific  4 
save.weightsO;  !*  fractional  value  (arbitrary),  then  exit.4 
printf("%d\n"  ,a); 
exit(O); 

} 

fprintf(ofp, ''  %  f \n "  ,J[1]); 

reset-pO;  /*  Zero  p.old[][][]  matrix  for  next  epoch,  4 
}  /♦  End  main  loop  portion  4 
fclose(ofp); 

save-weights();  /♦  Save  weights,  input  vector  z,  and  desired 
output  to  a  data  file  for  future  use.  4 


return; 

}  /*  end  function  train  Jiet()  4 


void  test_net()  /*  Written  10  Jun  91,  RLL.  4 

{ 

/*  Begin  main  loop  portion  4 

read-weightsO;  /*  Read  weight  matrix  and  saved  p  states.  4 

read-data();  /*  Read  data  vector  army  and  desired  output.  4 

o^=:fopen(" error  .tst",  "w"); 

J[1]  =  0.; 

for(t=0;t<num_vectors;t++)  { 

propagateO;  h-  Store  previous  outputs  y[t-l]  as  part  of 
the  new  input  vector  zftjfi].  Sum  all 
z[][]*w[][]  inputs  into  the  activation 
vector  s[t]  for  input  into  y[tj.  4 

compute_output(); /*  Compute  the  output  y(t+l)=f[s(t)].  4 

compute_error();  /*  Computes  the  error  at  time  t. 

How  far  off  are  the  outputs  from  the 

desired  values?  Compute  total  error.  4 

reset-delw_s();  /*  Reset  delta  weights,and  s[]  vectors  to 
zero  for  the  next  iteration,  4 

fprintf(ofp, "  %  f  \  n "  ,J[  1  ]); 
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}  I*  End  main  loop  portion  =/ 
fclose(ofp); 

ofp=fopen(  "testcheck.dat'',  "  w " ); 
loopi(num_vectors)  { 
loopj(nuni_outputs) 
fiprintf(ofp,"%  f  '',z[i][j+m]); 
loopj(num_outputs) 
j^riritf(ofp,''%  f 
fprintf(ofp,"\n"); 

} 

fclose(ofp); 

ofp=fopen("testd3s  .dat”,  "w"); 
loopi(num_vectors) 
loopj(num_outputs) 
fprintf(ofp, "  %  f  \  n "  ,d[i]  [j]); 
fclose(ofp); 

ofp=fopen("tes tout  .dat",  "w"); 
loopi(num_vectors) 
loopj(num_outputs) 
fprintf(ofp, "  %  f  \  n "  ,z[i]  [j+m]); 
fclose(ofp); 


printfC 'testcheck.dat '  contains  test  data.\n"); 

printf(" '  testout  .dat '  contains  net  output  test  data.Nn"); 

printf("  '  testdes .dat '  contains  desired  output  test  data.Xn"); 

return; 

}  /*  end  function  tesLnetO  V 


float  sigmoid(float  x)  /*  Written  30  May  91,  RLL.  V 

{ 

static  float  max_val=50.; 

if  (x  >  max_val) 
return  1.0; 
if  (x  <  -max_val) 
return  0.0; 

return  1/(1  +  exp(-x)); 

}  /*  end  sigmoid  V 
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void  initJietO  h  Written  10  Jun  91,  RLL.  V 

{ 


/*  Read  data  from  the  input  fie  "parameters.dat"  V 

ifp=fopen("parameters  .dat ”,  "r"); 
fscanf(ifp, " %d  %f  %d",&epochs,&alpha,&seed); 
fclose(ifp); 

/*  Read  data  from  the  input  fie  datafle  (user  specifed)  4 
ifp=fopen(datafile,  "r"); 

fscanf(ifp,"%d  %d  %d",&numJnputs,&num_outputs,&num_nodes); 
fscanf(ifp, "  %d "  ,&num_vectors); 

printf(''%d  %d  %d\n",numjnputs,num.outputs,num-nodes); 
fclose(ifp); 

m  =  num  Jnputs  +  1 ;  h#  of  external  inputs  4 

nrows  =  n  =  num-nodes;  /*  #  of  rows  for  weight  matrix  4 

ncols  =  m  +  num-nodes;  /*  #  of  cols  for  weight  matrix  4 

/*  Allocate  memory  for  vectors  and  matrices  4 

e=vector(0, nrows- 1);  /*  error  vector  4 
y=vector(0,nrows- 1);  /*  output  vector  4 
s=vector(0,nrows- 1);  h  sum  of  weighted  inputs  4 
yprime=vector(0,num_nodes-l);  h  dy/dw  4 
w=matrix(0,iu‘ows-l,0,ncols-l);  /*  weight  matrix  4 
delw=matrix(0, nrows- 1,0, ncols- 1);  /*  delta  weights  4 

z=matrix(0,num_vectors,0,ncols-l);  /*  input  vector  anay  4 
d=matrix(0,num_vectors,0,ncols- 1 );  /*  desired  output  array  4 
p=matrix3d(0,nrows- l,0,ncols- l,0,nrows- 1);  !*  dy/dw  4 
p_old=matrix3d(0,nrows- 1 ,0,ncols- 1 ,0,nrows- 1 );  /+  dy/dw  4 

/*  Initialize  variables  to  zero  4 

J[0]=J[1]=0.0; 

Ioopij(num.vectors,ncols) 

zm\i]  =  0.; 

loopij(num_vectors,num_outputs) 

d[i]G]  -  0.; 

loopi(nrows)  { 
y[i]  =  s[i]  =  e[i]  =  0.; 
loopj(ncols)  { 
w[i]0]  =  delw[i]0]  =  0.; 
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loopk(nrows) 

p[i]a][k]  =  p-old[i]0][k]=0.; 


/*  Initialize  weight  matrix  using  psuedo-random  numbers  V 

idum  =  -IABS(seed); 
ranK&idum); 
loopi(nrows)  { 
loopj(ncols)  { 

wPlDl  =  2*ranl(&idum)— 1.0; 
printf("%  f  ",w[i]|j]); 

} 

printf("\n''); 

} 

/*  Initialize  first  input  to  1  (non-extemal)  V 

loopi(num_vectore) 
z[i][0]  =  1.; 

return; 

} 


void  read_data()  /*  Written  10  Jun  91,  RLL.  4 

{ 

h  Read  data  file  external  inputs  V 

ifp=fopen(datafile,  "r"); 
fskipJine(ifp); 
loopi(num_vectors)  { 
loopj(nuin-inputs) 
fscanf(ifp, "  %f "  ,&z[i]Ij+l]); 
loopj(num_outputs) 
fscanf(ifp, "  %  f "  ,&d[i](j]); 

fclose(ifp); 

return; 

} 


void  propagateO  /*  Written  10  Sun  91,  RLL.  4 

h  Computes  the  state  of  the  net  at  time  t,  and 
initializes  the  z  vector  for  time  t.  4 

{ 
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I*  Set  previous  outputs  y[k]=y(t)  as  part  of  the  next  input  z[t][k+m].  V 


loopk(nrows) 

z[t][k+m]=y[k]; 

/*  Sum  all  inputs  into  each  of  the  k  nodes.  V 

loopk(nrows) 

loopi(ncols) 

s[k]  +=  w[k][i]  ♦  z[t][i]; 
return; 

} 


void  compute_output()  /*  Written  16  Jul  91,  RLL.  4 
/*  Computes  the  output  at  time  (t+1),  ie  y(Hl ).  4 

{ 

/*  Process  each  of  the  k  nodes  as  Sigmoidal  functions  with  input  s[t] 
unless  LINEAR  is  defined,  in  which  only  output  nodes  are  linear 
functions  of  s[t]  and  the  remaining  hidden  nodes  remain  Sigmoidal. 
The  output  computed  is  y[k]  =  y(t+l)  =  fisft]). 

4 

#ifdef  LINEAR 
loopk(num-outputs) 
y[k]  =  s[k]; 

loopk(nrows— num_outputs) 
y[k+num-outputs]  =  sigmoid(s[k+num_outputs]); 

#else 

loopk(nrows) 

y[k]  =  sigmoid(s[k]);  h  Here,  y[k]=y(t+l).  4 
#endif 

return ; 

} 


void  compute_error()  /*  Written  10  Jun  91,  RLL.  4 

{ 


/*  Compute  error  at  time  t  based  on  desired  output  values.  Returns  a 
zero  error  for  t=0  on  first  epoch.  4 

if  ((t  ==  0)  &&  (a  ==  0))  return; 
else 
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loopk(nuin_outputs) 
e[k]  =  d[tl[k]  -  y[k]; 

h  Total  error  cumulated  over  each  epoch.  After  each  epoch,  J=0.  4 

loopk(num_outputs) 

J[l]+=0.5*e[k]*e[k]; 

return ; 

} 


voidupdateO  /*  Written  10  Jun  91,  RLL, 

Modified  28  Jun  91,  RLL.  4 

{ 

/*  Compute  change  of  weights  at  time  t.  delw  is  reset  to  zero  at  each 
iteration  (time  step),  and  p.old  is  p(t).  4 

loopij(nrows,ncols) 

loopk(num_outputs) 

delw[i]D]  +=  alpha  *  e[k]  *  p-old[i][j][k]; 

/♦  Update  rules.  Computes  p(t+l).  4 

#ifdef  LINEAR 
loopk(num.outputs) 
yprime[k]  =  z[t][k]; 

Ioopk(nrows-num_outputs) 
yprime[k+num_outputs]  =  y[k]>t=(1.0-y[k]); 

#else 

loopk(nrows) 

yprinie[k]  =  y[k]+(1.0-y[k]);  h  Uses  y[k]  =  y(t+l).  4 

#endif 

/*  m  =  numJnputs  +  1  4 

loopi(nrows)  !*  mows  =  nummodes.  4 

loopj(ncols)  /*  ncols  =  numjiodes  +  4 

loopk(nrows)  {  /*  numJnputs  + 1  4 

kron  =  0.0;  /*  Kronecker  delta  function.4 

if(i==k)  kron  =  1.0; 

sum  =  0.; 
loopl(num-nodes) 

sum  +=  w[k][l+m]+p.oId[i][j][I];  /*  p.old = p(t).  4 

PPIDIM  =  yprime[k]*(sum+kron*z[t][j]);  /*Usesz(t).  4 
}  /*  p[][][]  is  now  for  time  p(t+l).  4 


67 


I*  Update  weights.  Computes  weights  for  time  w(t+l).  V 

loopij(nrows,ncols) 
w[i]Ij]  +=  delw[i][j]; 

/*  Save  partial  derivitives  for  next  iteration  (time  t+1)  and  reset 
p  matrix  by  swapping  the  pointers  of  the  old  p  matrix  with  the  new 
p  matrix.  4 

p_temp  =  p_old; 

p_old  =  p;  h  p-old  is  now  p(t-hl).  4 
p  =  p_temp; 

return ; 

} 


void  reset_delw-s()  /*  Written  30  May  91,  RLL.  4 

{ 


/*  Reset  delta  weights  and  input  sum  to  zero  for  next  calculation.  4 

loopij(nrows,ncoIs) 
delw[i][j]  =  0.; 
loopi(nrows) 
s[i]  =  0.; 
return; 

} 

void  reset_p()  /*  Written  15  Jul  91,  RLL.  4 

{ 


/*  Zero  p-old[][][]  for  next  calculation.  4 

loopij(nrows,ncols) 
loopk(nrows) 
p_old[i][j][k]=0.; 
loopi(nrows) 
y[i]  =  0.; 
return; 

} 


void  save_weights()  !*  Written  28  Jun  91,  RLL.  4 

{ 

FILE  *afp; 
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ofp=fopen( "weights.dat",  "w"); 
i  =  num-vectors  -  1; 
loopj(ncols) 
fprintf(ofp."%  f 
fprintf(ofp,''\n"); 
loopKnrows)  { 
loopj(ncoIs) 

fprintf(ofi),"%  f  ",w[i](j]); 
fprintf(ofp,“  \n"); 

} 

fclose(ofp); 

afp=fopen(  “  ne  tou  t .  da  t " ,  "  w " ); 
loopi(nuin-vectors)  { 
loopj(num-outputs) 
fprintf(ofp,''%  f  '',z[i]lj+in]); 
fprintf(ofp,"  \n“); 

} 

fclose(afp); 

afp=fopen("desired.dat",  "w"); 
loopi(num_vectors)  { 
loopjXnum-outputs) 
fprintf(ofp,"%  f  ",d[i]Ij]); 
fprintf(ofp,"  \n“); 

} 

fclose(afp); 

return; 

} 

void  read_weights()  /*  Written  28  Jan  91,  RLL.  4 

ifp=fopen(" weights.dat",  "r"); 
i  =  0; 

loopj(ncols) 
fscanf(ifp, "  %  f " 
loopKnrows) 
loopj(ncols) 

fscanf(ifp,"%f",&w[i][j]); 

fclose(ifp); 

return; 

} 

void  check-fileO  /*  Written  10  Jul  91,  RLL. 

{ 

FILE  *afp; 

afp  =  fopen(datafile, "  r " ); 
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} 


if(afp  =  NULL)  { 
hsticpy(aRle,  "File  not  found");^! 
printf("\n%s  % s\n'' .datafile, " : 
exit(O); 

} 

else  fclose(afp); 
return; 


File  not  found."); 


/*  NRUTJL.C  ♦♦**♦♦♦♦♦*♦**♦*♦♦*♦*♦*♦***♦****♦***♦*♦♦♦*+♦ 

Utilities  which  create  vectors,  matricies,  and 
3-D  matrices. 

*♦♦♦*♦+♦+**♦*♦♦♦****♦♦♦:♦:♦**♦**♦*****♦♦**♦**♦*♦♦*♦♦**♦**♦♦*  V 

#include  “malloc.h" 

#include  <stdio.h> 

void  nrerror(error_text) 
charerror_text[]; 

{ 

void  exitO; 

fprintf(stdeiT, "Numerical  Recipes  run-time  error ...  \n"); 
^rintf(stderr, "  %s  \n " , error-text); 

fprintf(stdeiT, " . .  .now  exiting  to  system. . .  \n"); 
exit(l); 

} 

float  *vector(nl,nh) 
int  nl.nh; 

{ 

float  ♦v; 

v=(float  *)malloc((unsigned)  (nh-nl+l)*sizeof(float)); 
if (!v)nreiTor(" allocation  failure  in  vector!)"); 
return  v— nl; 

} 

int  ♦ivector(nl,nh) 
int  nl,nh; 

{ 

int  *v; 
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v=(int  ♦)malloc((unsigned)  (nh-nl+l)*sizeof(int)); 

if (!v) nrerror(" allocation  failure  in  ivectorO"); 

return  v-nl; 

} 

double  *dvector(nl,nh) 
int  nl,nh; 

{ 

double  *v; 

v=(double  ♦)malloc((unsigned)  (nh-nl+l)»<sizeof(double)); 
if (!v) nrerrorC allocation  failure  in  dvectorO"); 
return  v-nl; 

} 


float  **matrix(nrl,nrh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

inti; 

float  ♦*m; 

m=(float  **)  malloc((unsigned)  (nrh-nrl+l)*sizeof(float*)); 
if (!m)nrerror(" allocation  failure  1  in  matrix! ) ''); 
m  -=  nrl; 

for(i=nrl;i<nrh;i++)  { 

m[i]=(float  *)  malloc((unsigned)  (nch-ncl+l)*sizeof(float)); 
if (!m[i]) nrerror("allocation  failure  2  in  matrix!)"); 
m[i]  -=  ncl; 

} 

return  m; 

} 


3(c  ^  :4c  3{c  jfe  3|c  3|(  )|c  :{c  :4c  ^  ^  sjc  :|e  9|c  ifc  %  ^  :{c  3(c  sjc  3(c  3|c  %  %  9)c  :jc  3|c  :fe  :|c)|e  ^  3|c 

matiixSdQ  created  by  Randall  Lindsey  on  15  May  91  for 
use  in  recnet.c 

float  ***matrix3d(nr!,nrh,ncl,nch,ndl,ndh) 
int  nrl,nrh,ncl,nch,ndl,ndh; 

{ 

int  i,j; 
float 

m=(float  *♦*)  malloc((unsigned)  (nrh-nrl+l)*sizeof(float**)); 
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if (!m)nrerror(" allocation  failure  1  in  matrix3d( ) "); 
m  -=  nrl; 

for(i=nrl;i<nrh;i++)  { 

m[i]=(float  **)  malloc((unsigned)  (nch-ncl+l)*sizeof(float*)); 
if  (!m[i])  nrerror(" allocation  failure  2  in  inatrix3d( ) 
m[i]  -=  ncl; 
forO=ncly<nch;j++)  { 

ni[i]D]=(float  ♦)  malloc((unsigned)  (ndh-ndl+l)*sizeof(float)); 
if  (!m[i][j])  nrerrorC allocation  failure  3  in  matrix3d( ) 
mplDl  -=  ndl; 


return  m; 

} 

double  *=xdmatrix(nrl,nrh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

int  i; 

double  =x*m; 

m=(double  *♦)  malloc((unsigned)  (nrh-nrl+l)*sizeof(double*)); 
if (!m) nrerror(''allocation  failure  1  in  dmatrixO"); 
m  -=  nrl; 

for(i=nrl;i<nrh;i++)  { 

m[i]=(double  *)  malloc((unsigned)  (nch-ncl+l)*sizeof(double)); 
if (!ni[i]) nrerror("allocation  failure  2  in  dmatrixO"); 
m[i]  -=  ncl; 

} 

return  m; 

} 

int  **imatrix(nrl,nrh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

int  i,**ni; 

m=(int  **)malloc((unsigned)  (nrh-nrl+l)*sizeof(int*)); 
if (!m)nrerror("allocation  failure  1  in  imatrixO"); 
m  -=  nrl; 

for(i=nrl;i<nrh;i++)  { 

m[i]=(int  *)malloc((unsigned)  (nch-ncl+l)*sizeof(int)); 

if  (!m[i])  nrerrorC  allocation  failure  2  in  imatrixO"); 

m[i]  -=  ncl; 
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} 

return  m; 

} 


/i  ’ll  ^  4:  it: 'i'' ^ ^ it  ^  ^ ^  ^  ^  4:  ^  lie  if:  ^  lit ^  V 


/*  RANl.C  *********iii************************************* 
Numerical  Recipies  pseudo-random  number  generator. 

ifet;  ^  4:  #  >(:  it:  i):  >!<  if:  i|:  i|c  i|:  >(:  if:  if:  it:  it:  it:  if:  if:  it:  if:  i|:  ik  it:  if:  if:  if:  it:  if: i):  if:  it:  i):  it:  i(<  it:  if:  if:  if:  if:  ii<  i):  if:  >t(  it: it<  it:  i):  if:  it:  >l<  i|:  V 

#define  Ml  259200 
#define  lAl  7141 
#define  ICl  54773 
#define  RMl  (1.0/Ml) 

#define  M2  134456 
#define  IA2  8121 
#defineIC2  28411 
#define  RM2  (1.0/M2) 

#define  M3  243000 
#define  IA3  4561 
#define  IC3  51349 

extern  float  ranl(idum) 
int  ii:idum; 

{ 

static  long  ixl,ix2,ix3; 
static  float  r[98]; 
float  temp; 
static  int  iff=0; 
intj; 

void  nrerrorO; 

if  (tidum  <  0  II  iff  ==  0)  { 
iff=l; 

ixl=(ICl-(ii'idum))  %  Ml; 
ixl=(IAlit:ixl+ICl)%Ml; 
ix2=ixl  %  M2; 
ixl=(IAlit:ixl+ICl)%Ml; 
ix3=ixl  %  M3; 
for(j=lu<97a++)  { 
ixl=(IAlif<ixl+ICl)%Ml; 
ix2=(IA2it:ix2+IC2)  %  M2; 
r(j]=(ixl+ix2it=RM2)it:RMl ; 

} 

*idum=l; 
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} 


} 

ixl=(IAl*ixl+ICl)%Ml; 
ix2=(IA2*ix2+IC2)  %  M2; 
ix3=(IA3*ix3+IC3)  %  M3; 
j=l  +  ((97*ix3)/M3); 
if  (j  >  97  II  j  <  1)  nrerrorC' RANI : 
temp=r[j]; 

r[j]=Uxl+ix2*RM2)*RMl ; 
return  temp; 


This  cannot  happen.''); 


#undef  Ml 
#undef lAl 
#undefICl 
#undef  RMl 
#undef  M2 
#undef  IA2 
#undefIC2 
#undefRM2 
#undefM3 
#undefIA3 
#undef IC3 


/:(:  i(c  ^  ^  :|c  :4c  :|c:|c ;|c  ifc :fc  i|c  i)c  ^  ^  ;|c  jfc :)( sfe  ;|c  !)c  ;|c  :(c  :|c  :|c  !(c ^  ^  >|c it  ^  ^  ^  !(c  V 

The  following  additional  listing  is  a  supporting  data  file  required  for  the  recurrent 
network  program  to  work  properly. 


/*  PARAMETERS.DAT 4f********************************^********^^ 


3004.0  987654321 


/:4c  :(c  cfc  :|c  :4c  :4c  cfc  :|e  :tc  :4c  :4c  :|c  :4c  cfc  :4c  :4c  cfc  :4c  :4c  :(c  :4c  :4c  >4;  :|c  cfc  cfc ’I’ :(c  :fc  :4c  :4c  :4c :(c  :(c  cfc  :4c  :4c  :(c :)( ifc  :4:  %  :(c  :(c  i|c  :4:  :fc  :4:  %  3|c  cfc  :|c  :|c  :4c  :4;  :4c  :(c  :4c  cfc :(/ 
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Appendix  C.  Source  Code  for  Creation  of  Data 


This  appendix  contains  a  listing  of  the  source  code  which  generated  the  data  for 
testing  the  modified  recurrent  neural  network. 


/*  MAKEJDATA.C  ***!<‘**!i‘:i:**:^‘**^*****!t:if^^**!t******:^‘*********^!**t^ 


#include  <stdio.h> 

#include  "macros. h" 

#include  "ranl.c" 

FILE  *ofp,  ^Ifp; 

void  main(int  argc,  char  *argv[]) 

{ 

float  class  junk[2] [2048], x[2] [2048],a0,al ,a2,b0,bl ; 
int  idum=l,ij,bubba; 

switch  (argc)  { 
case  1: 
case  2: 

printf(" \n%s\n\n",” Usage :  make_data  filename.dat  num_nodes  [bin]"); 
exit(O); 
break; 

case  3: 

a0=0.0676;  al=0.1352;  a2=0.0676; 
b0=1.1422;bl=-0.4124; 
bubba=atoi(argv[2]); 
ofp=fopen(argv[l],  "w"); 
idum  =  -IABS(737496732); 
ranl(&idum); 

fprintf(ofp, " %d  %d  %d  %d\n",l,l,bubba,100); 

x[l][0]=x[l][l]=0.0; 

loopi(lOO) 

x[0][i]  =  2.0*r_.l(&idum)— 1.0; 
loopi(98) 

x[l][i+2]=a0*x[0][i+2]+al*x[0][i+l]+a2*x[0][i]+b0*x[l][i+l]+bl*x[l][i]; 

loopi(lOO) 

fprintf(ofp, " %  f  %  f  \n",x[0][i],x[l][i]); 
fclose(ofp); 
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break; 


case  4: 

bubba=atoi(argv[2]); 
ofp=fopen(argv[l],  "w"); 
idum  =  -IABS(97475298); 
ranl(&idum); 

fprintf(ofp,"%d  %d  %d  %d",l,l,bubba,2048); 

junk[l][0]=0.; 

loopi(2048) 

junk[0]  [i]=ranl  (&idum); 

Ioopi(2048) 

junk[l][i]=junk[0][i]; 

loopi(2048) 

fprintf(ofp,"\n%  f  %  f  ’',junk[0][i],junk[l][i]); 
fclose(ofp); 
break; 


^  ^  ^  ifc  sfc  ^  ^  ^  ^  lie  ^  4=  ^  ^  ^  ^  ^  ^  ^ ^  4: ^  4=  ^  ^  ^  ^  ^  ^  ^  ^  ^  :tc :(/ 

/*  XOR-DATA.C  ******************:it!************************:t/ 


#include  <stdio.h> 

#include  "macros. h" 

#include  "  rani .  c " 

FILE  *ofp,  ♦ifp; 

void  main(int  arge,  char  *argv[]) 

{ 

float  class, junk[2][1024],seed; 
int  idum=l,i,j,bubba; 

switch  (arge)  { 
case  1: 
case  2: 
case  3: 

printf("\n%s\n\n", "Usage:  make_data  filename.dat  num_nodes  seed 
[bin] "); 
exit(O); 
break; 

case  4: 
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bubba=atoi(argv  [2] ) ; 
ofp=fopen(argv[l],  "  w " ); 
idum  =  -IABS(seed); 
ranl(&idum); 

fprintf(ofp, '■  %d  %d  %d  %d",2,l,bubba,1024); 
loopi(1024)  { 
loopj‘(2)  { 

junk[j][i]=ranl(&idum); 
if(junkO][i]>0.5)junk|j][i]=1.0; 
else  junk |j][i]=0.0; 

} 

if  (i  <  2)  class=1.0; 

if  (0unk[0][i-2]>0.5)  &&  0unk[ll[i-2]>0.5))  class=0.0; 
if  (aunk[0][i-23<0.5)  &&  (junk[l][i-2]>0.5))  class=-4.0; 
if  ((junk[0][i— 2]>0.5)  &&  (junk[l][i-2]<0.5))  class=1.0; 
if  (0unk[0][i-2]<0.5)  &&  (junk[l][i-2]<0.5))  class=0.0; 

} 

fprintf(o^,"\n%f  %f  %f  "ounk[03[i],junk[l][i], class); 

} 

fclose(ofp); 

break; 

case  5: 

bubba=atoi(argv[23); 
ofp=fopen(argv[  1 3,  "  w ); 
idum  =  -IABS(seed); 
ranl(&idum); 

fprintf(ofp, " %d  %d  %d  %d",2,l,bubba,1024); 
loopi(1024)  { 
loopJ(2) 

junklj3  [i3=ran  1  (&idum); 
if(i  <2)  class=1.0; 

if  ((junk[03[i-23>0.5)  &&  (junk[13[i-23>0.5))  class=0.0; 
if  ((junk[03[i-23<0.5)  &&  (junk[13[i-23>0.5))  class=1.0 
if  ((junk[03[i-23>0.5)  &&  (junk[13[i-23<0.5))  class=1.0 
if  ((junk[03[i-23<0.5)  &&  (junk[13[i-23<0.5))  class=0.0 

} 

fprintf(ofp,"\n%f  %f  %f  ",junk[03[i3junk[13[i3, class); 

} 

fclose(ofp); 

break; 


/^ii:*t************************************************if***********:i>/ 


Appendix  D.  Utility  Source  Code 


This  appendix  contains  a  listing  of  the  utilities  source  code.  These  programs  were 
used  to  make  the  data  better  suited  to  the  neural  network  environment. 

!**  STAT-NORM.C  Jtc************************:!'’!'******************* 

Performs  statistical  normalization  on  tilename.dat  and 
creates  tjlename.dat.sn  as  its  output. 


#include  <stdio.h> 

#include  <math.h> 

#define  INPUTS  75  /*  max  number  of  features  V 
/**  begin  Main  Program 


void  main  (argc,  argv) 
int  argc; 
char  *argv[]; 

{ 

/♦==========:========:=========:======:=:«/ 

/♦==  local  variables  ==V 

/*========================:=:====:=======:==========V 


FILE  *fopen(); 

FILE  *input,  ♦fopen(); 
char  infile[50]; 

FILE  ^output; 
char  outfile[50]; 
float  value,  trash; 

float  deviation[INPUTS],  average[INPUTS]; 

int  i,  j,  inputs,  outputs,  ivalue; 

int  countl,  count2,  count,  waste,  temp; 

l*=z====:=:================z===:======zz=====:j/ 

/*===  did  user  specify  an  input  file  ===/ 

if  (argc  2)  { 

printf (" \n\nUsage  ->  stat-norm  <f ilenaine>\n\n"); 
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/*===  exit  after  pointing  out  the  error  ==/ 
exit  (1000); 

} 


/*===  user  did  specify  an  input  hie  =V 

/=|==========================:===:==:======^===:=V 

strcpy  (infile,  argv[l]);  /*  use  inputted  name  as  base  ^ 

/ ♦  ==:=================:========:===;========:=:=:====)/ 

/*===  Open  Input  File  =V 

/♦===========================)/ 


printf (" \nOpening  Input  File:  %s\n\n", infile); 

if  (!(input  =  fopen(infile,  "  rb " ))) 

{ 

printf C'XnCan't  open  input  file:  %s\n\n",  infile); 
exit  (2000); 

} 


/♦===========:===:==================z===:=:=)/ 

/+===  read  the  header  information 

/+===:=====:=========:========:==:r======:============r=V 


fscanf (input, ''%d  %d  %d  %d\n'', &countl,  &outputs,  &inputs,  &count); 

if  (countl  <  0 II  count2  <  0  ||  inputs  <  0  ||  outputs  <  0) 

{ 

printf  (“One  of  the  header  inputs  is  negative\n\n"); 
exit  (3000); 

} 

printf  ("There  are  %d  training  vectorsNn",  count); 
printf  ("There  are  %d  test  vectors\n",  count); 
printf  ("There  are  %d  inputsNn",  inputs); 
printf  ("There  are  %d  outputs\n\n",  outputs); 

I*  count  =  countl  +  connt2;^/ 


/♦===================:===============:=====</ 

/*===  initialize  things  =r=V 

/*===========================================:======V 


for  (i  =  0;  i  <  inputs;  i++) 

{ 
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average[i]  =  deviationp]  =  0.0; 

} 

/*==  loop  until  all  data  has  been  read  in  =V 

printf (" Reading  the  Data\n\n"); 

for  (i  =  0;  i  <  count;  i++) 

{ 

hfscanf  (input,  "%d  ",  &trash);^i  h  read  line  counter  =/ 

for  (j  =  0;  j  <  inputs;  j++) 

{ 

fscanf  (input,  ''%f  '',&value);  /*  read  float  values  ^ 
averagelj]  +=  value; 

} 

/*  for  (j  =  0;  j  <  outputs- 1;  j-H-) 

{ 

fscanf  (input,  "%f ",  &value); 

}  V 

/*  fscanf  (input,  "%f\n\n",  Avalue);  4 

} 

fclose  (input); 

l*=======z=:====:===s:===:=========:=====:======:==:===:==z==4 

/*===  calculate  the  averages  =4 

printf  ("Calculating  Averages\n\n"); 

for  (i  =  0;  i  <  inputs;  i++) 

{ 

average[i]  /=  (float)count; 

} 


/*======================================:===:====V 

/*===  Re-Open  the  input  file  =4 


printf  ("Re-Opening  Input  File:  %s\n\n",  infile); 
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if  (!(input  =  fopen(infile,  "  rb " ))) 

{ 

printf ("\nCan' t  re-open  input  file:  %s\n\n'', infile); 
exit  (2000); 

} 

/♦  ====■==:;=:==;======== — =====— 

/♦===  throw  away  the  header  information  this  time  =s=V 

/ ♦  ====:='J==:===sr==s:==s=:==:=rs=:.-=r===’sr==r===:=:========rrs:=:=5|/ 

fscanf  (input, ''%d  %d  %d  %d\n",&waste,  &waste,  &waste,  &waste); 

/♦===================================V 

/♦==  loop  until  all  data  has  been  read  in  =V 

/♦===========================V 


printf  ("Reading  the  Data\n\n"); 

for  (i  =  0;  i  <  count;  i++) 

{ 

!*  fscanf  (input,  "%d  ",  &trash);^  /*  read  line  counter  V 

for  (j  =  0;  j  <  inputs;  j++) 

{ 

fscanf  (input,  ”  %  f  ",  &value);  /*  read  float  values  V 

value  -=  averagejj];  /*  subtract  off  the  average  V 
value  *=  value;  /*  square  the  result  V 

deviation[j]  +=  value;  /*  hang  onto  it  until  all  done  V 

} 

/*  for  (j  =  0;j  <  outputs-1;  j++) 

{ 

fscanf  (input,  "%f ",  Scvalue); 

}  V 

I*  fscanf  (input,  "%f\n\n",  &value);^ 

} 

fclose  (input); 


/*=====s========:=:===r=====:=;=rr=:==’===:=:==:=======:======:==V 

/*===  calculate  the  standard  deviation  ==V 
/*=============================:====================)/ 


printf  ("Calculating  Standard  Deviations\n\n"); 
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for  (i  =  0;  i  <  inputs;  i++) 

{ 

deviationp]  /=  count  -  1; 

deviation[i]  =  (float)sqrt((double)deviation[i]); 

} 


/ I|c  ==========:========r====:==:s;=:=r==;===r========:=:=====v 

/*==  make  output-file  name  ==V 

/♦====:==============:=:============V 

sprintf  (outfile,  "  %s .  sn",  argv[l]); 

/ *  ====:=r====:==;s===:==:=======:===;=:==r====:====:=::==:==s:=a|/ 

/♦==:  Open  Output  File  =V 

/♦==;==========;========:======:=========:=================:=V 


printf ("Opening  Output  File:  %s\n\n", outfile); 

if  (!(output  =  fopen(outfile,  "wb"))) 

{ 

printf ("\nCan' t  open  output  file:  %s\n\n", outfile); 
exit  (2000); 

} 

/*==  Re-open  the  input  file  =V 

/»====:=====;===ri==:===:=:==:====r===:======r=s^==========:==s)^ 

printf  ("Re-Opening  Input  File  (last  time):  %s\n\n",  infile); 

if  (!(input  =  fopen(infile,  "rb"))) 

{ 

printf ("\nCan' t  re-open  input  file:  %s\n\n", infile); 
exit  (2000); 

} 

l*s==:=:===z==^=::r==s======:===:==rz=::==:=:=z======z==:=rf 

/♦=:=  read  and  save  header 

l^i=:============z:====s:================:========z=:==rf 

fscanf  (input,  "%d  %d  %d  %d\n",&countI,  &outputs,  &inputs,  &count); 
fprintf  (output,  "%d  %d  %d  %d\n",  countl,  outputs,  inputs,  count); 

/*===^======================================</ 

/*===  read  data  in,  modify  it,  save  it  back  out  ===V 
/♦=r====r=========:======:======:========:=:=======:===r==V 
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printf ("Reading,  Modifying  and  Re-Saving  the  Data\n\n"); 


for  (i  =  0;  i  <  count;  i++) 

{ 

Mscanf  (input,  "%d  ",  Aivalue);^  !*  read  line  counter  4 
hfprintf  (output,  "%d  ",  ivalue);^  /*  save  line  counter  4 

for  (j  =  0;  j  <  inputs;  j++) 

{ 

fscanf  (input,  "  % f  ",  &value);  t*  read  float  value  4 

value  -=  average[j];  /♦  modify  the  value  4 
value  /=  deviationjj]; 

fprintf  (output, "  %  f  "  ,value);  /*  save  modified  value  4 

} 

fprintf  (output, "  \  n " ); 
h  for  (j  =  0;  j  <  outputs-l;  j-h+) 

{ 

fscanf  (input,  "%f  ",  &value); 
fprintf  (output,  ”%f”,  value); 

h fscanf  (input,  "%d\n\n",  &ivalue); 
fprintf  (output,  "%d\n\n",  ivalue);4 

} 

fclose  (input); 
fclose  (output); 


/♦=:==========================:=====:===V 

/*=:==  we’re  done  =4 


printf  ("Finished.  \n\n"); 

} 


/**  FFT.C  **l(t>|t***>(t***  +  *j|C!(t*l(t*!(!!(t*  +  *l(ti(t!('t*!*t*******!(t*j|!  +  **  +  +  )t!*!(t*** 
Fast  Fourier  Transform  Program 

#inc]ude  <stdio.h> 
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#include  <niath.h> 


#define  loopi(A)  for(i=0;i<(A);i++) 

#define  loopj(A)  for(j=Oy<(A)y++) 

#define  loopij(A,B)  for  (i=0;  i<(A);  i++)\ 
for  (j=0;  j<(B);j++); 

#define  SQ(A)  (A*A) 

#define  PI  3.1415926 

main(int  argc.char  ♦argv[]) 

{ 

FILE  *fin,  *fout; 
tioat  *output,*input,*trunc_out; 
float  norm; 
float  ♦vectorC); 

/*void  doflipO;^ 
void  foumQ; 

/*void  truncateO;^ 

/*void  *free.vectorOi^ 
char  name[30]; 

int  ij,  nn[l],  ndim,  isign,  new_order,  order,  image..size; 

,if(argc  ^  3)  { 

printf(" !  !  !  The  command  line  should  be  !  !  !  :  \n\n  fft_trunc 
infile  outfile  \n\n"); 
exit(O); 

} 

printfit" !  !  !  Input  the  input  images  SIZE  and  ORDER:  "); 
scanf( "  %d%d "  ,&image_size,«feorder); 

/+*s^*+j(<***+****+**)(tset  up  dynamic  allocation****************’^ 

input  =  vector(0,2*image_size*image_size-l); 
output  =  vector(0,image_size*image_size-l); 


/***>)c*:«c>)c*:tt*t*******  Set  Up  Files  ***********************^1/ 

if  ((fin=fopen(argv[l],"r "))  =  NULL)  { 
printf("I  can't  open  the  input  file"); 
exit(-l); 

} 

if  ((fout-fopen(argv[2],"w"))  =  NULL){ 
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printfC'I  can't  open  the  output  file"); 
exit(-l); 

} 


/^^^*^*^t**^^****RdQd  File  **t**********^*****^ 

loopi(2*image_size*image_size- 1)  /*  initialize  array  to  zero  4 
input[i]  =  0.0; 

loopi(image_size*image-size- 1 )  bread  data  in  the  foum  format  4 
fscanf(fin,  "  %  f  \n " ,  &input[i*2]);  /*  see  numerical  recipes  in  c  V 

fclose(fin);  bdose  input  file  V 

/*****  Initialization  parameters  for FFT  *******^ 

nn[0]=image_size;  /*  size  of  mput  lAW  foumQ  =/ 
nn[l]=image_size; 

ndim=l;  /*  one  dim  FFT  =/ 
bndim=2;  -4  !*  two  dim  FFT  4 

isign=l;  /*  FFT  4 

foum(input- 1  ,nn- 1  .ndim.isign); 


Find  Fourier  Magnitude  **********4 
j=0; 

for(i=0;i<(2*image_size*image_size-l);  i+=2)  { 
output[j]=sqrt((double)SQ(input[i])4-(double)SQ(input[i+l])); 

j++; 

} 

norm=output[0];  /*  d.c  component  used  for  normalization  *4 
printf( "  %  4 . 0  f  \  n "  ,norm); 

/♦****  normalize  and  write  output  of  FFT  in  argv[2]  file  **4 

loopi(image_size*image_size)  { 
output[i]=output[i]/norm; 
fprintf(fout,  "  %  1 . 4  f  \  n " ,  outputp]); 

} 

fclose(fout); 
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^ofHp******************************^ 


/*dofiip(output,image.size);  V  /*  converts  fourn  format  to  human  format  V 
hprintf("%4Af^n'\output[8128]);M 

truncate  ********  it:***  it:****  +  t  +  + 

truncate  takes  fft(output)  of  size(image.size)  and  truncates  the 
FFT  to  order  specified  plus  d.c.  the  array  is  returned  in 
trunc-out,  the  argv[2J  is  used  as  a  header  when  truncate  writes 
the  output  in  netfft.dat 

!(C***************************************************************V 

if(order  ^  0){ 
new-order  =  2*order+l; 

trunc-Out  =  vector(0,image_size*image_size-l); 
truncate(output,image_size,order,trunc_out,  argv[2]); 
free_vector(trunc_out,0,iniage_size*image-size- 1 ); 

} 

free_vector(input,0,2*image_size*image_size-l); 
free_vector(output,0,image_size*image_size- 1 ); 


} 


/jf:************^***;(:****!|c***************9|c*****:tc3|e**  ********:(:**** 

NAME:  foum.c 

DESCRIPTION:  Numerical  Recipies  multi  dimensional  FFT  routine. 
Requires  a  complex  column  vector  as  follows: 

/reala(l)/ 

/  complex  a(l)/ 

/reala(2)/ 

/complex  a(2)/ 

/etc/ 

SUBROUTINES  CALLED: 

WRITTEN  BY:  Numerical  Recipies  in  C 

#include  <math.h> 

#define  SWAP(a,b)  tempr=(a);(a)=(b);(b)=tempr 

void  foum(data,nn,ndini,isign) 
float  dataH; 
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int  nn[],ndim,isign; 

{ 

int  il  ,i2,i3,i2rev,i3rev,ipl  ,ip2,ip3,ifpl  ,ifp2; 
int  ibit,idim,kl  ,k2,n,nprev,nreni,ntot; 
float  tempi.tempr; 
double  theta, wi,wpi,wpr,wr,wtenip; 

ntot=l; 

for  (idim=l;idim<ndim;idim++) 
ntot  ♦=  nn[idim]; 
nprev=l; 

for  (idim=ndim;idim>l;idini — )  { 
n=:nn[idim]; 
nreni=ntot/(n*nprev); 
ipl=nprev  <  1; 
ip2=ipl*n; 
ip3=ip2*nrem; 
i2rev=l; 

for(i2=l;i2<ip2;i2+=ipl)  { 
if  (i2  <  i2rev)  { 

for(il=i2;il<i2+ipl-2;il+=2)  { 
for  (i3=il;i3<ip3;i3+=ip2)  { 
i3rev=i2rev+i3-i2; 
SWAP(data[i3],data[i3rev]); 
SWAP(data[i3+l],data[i3rev+l]); 


ibit=:ip2  »  1; 

while  (ibit  >  ipl  &&  i2rev  >  ibit)  { 
i2rev  — =  ibit; 
ibit  »=1; 

} 

i2rev  +=  ibit; 

} 

ifpl=ipl; 

while  (ifpl  <  ip2)  { 
ifp2=ifpl  «:  1; 

theta=isign*6.28318530717959/Cifp2/ipl); 

wtemp=sin(0.5*theta); 

wpr  =  — 2.0*wtemp*wtemp; 

wpi=sin(theta); 

wr=1.0; 

wi=0.0; 

for  (i3=l;i3<ifpl;i3+=ipl)  { 
for  (il=:i3;il<i3+ipl~2;il+=2)  { 
for(i2=il;i2<ip3;i2+=ifp2)  { 
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kl=i2; 

k2=kl+ifpl; 

tempD=wr*data[k2]  -  wi*data[k2+l  ] ; 
tempi=:wr*data[k2+l  ]+wi*dataDc2] ; 
data[k2]=data[kl]  -tempr; 
data[k2+l]=datalkl+l]-tempi; 
data[kl]  +=  tempr; 
data[kl4l]  +=  tempi; 


wp=(wtemp=wr)*wpr-wi*wpi+wr; 

wi=wi*wpr+wtemp>i'wpi+wi; 

} 

ifpl=ifp2; 

} 

nprev  ♦=  n; 


#undefSWAP 


COSMatiix.c  creates  the  Cosine  matrix.  ♦*♦♦*♦**♦♦*♦♦*♦ 

Written  By:  Jim  Goble 
Date:  1  July,  1991 

Version:  1.0 


#include  <math.h> 

#include  <stdio.h> 

#define  PI  3.14159265 
mainO 
{ 

FILE  ^Cfile.+ofp;  /*  My  storage  file  pointer  V 
double  temp.NN; 
double  cos(),sqrt(); 
int  m,  n,  M,  N,  X,  Y; 

printfC !  !  !  Input  the  desired  number  of  rows:")'. 

scanf("%d",  &N); 

printf("\n"); 

/*printf("!!!  Input  the  desired  number  of  cycles  in  Y:"); 

scanfC%d", &Y); 

printfC\n");^ 
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printfC !  !  !  Input  the  desired  nvimber  of  cycles  in  X:"); 

scanf("%d",  &X); 

printf("\n''); 


M  =  N; 
NN  =  N; 


M  =  M  -  1;  h  increment  variables  4 
N  =  N-  1; 

h  Open  Cfile  for  writing.  Note  there  is  no  error  checking!  4 

Cfile  =  fopen(''cos.dat“,  "w"); 

UCompute  the  CMatrix  4 

for  (m  =  0;  m<  M;  ++m)  { 
temp  =  cos((2*X*m*PI)/NN); 
fprintf(Cfile,  "%-8.7f\n",  temp); 

}  h  end  of  m  for  loop  4 
fcIose(Cfile); 

}  /*  end  of  program  4 

/♦♦♦♦♦♦♦♦♦♦******j)C****!(C*!)t**  +  +  *****!)t****!|<4t****!)!  +  =tt=|t3|t******S|<5|<***!|/ 
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Appendix  E.  Statistical  Prediction  Algorithm  and  Source  Code 


This  appendix  contains  an  overview  of  the  statistical  prediction  algorithm  used  to 
compare  with  the  recurrent  network  prediction.  Following  the  description  of  the  algorithm 
is  a  listing  of  the  C  source  code  which  empliments  the  statistical  prediction  algorithm. 

EJ  Statistical  Prediction  Algorithm 

Given  an  ergotic  signal  described  by  the  function  x{t),  the  future  value  of  a;(t)  at 
time  <2  is  given  by 

Ah)  =  Eh„]  +  (!((■)  -  E|:r„))  (14) 

where  the  expectation  values  (means)  and  E[xt^  are  g.ven  by 

E[xt,\  =  E[xt^]  =  (15) 

and  the  variance  uor(.T<, )  is  given  by 

=  (16) 

and  the  covariance  cov{xt^ ,  XiJ  is  given  by 

cov(xt, , XjJ  =  E[xt,xtj  -  E^lxtJ)  ( 17) 


For  the  variance,  the  expectation  value  (mean)  of  xf^  is  given  by 


and  for  the  covariance,  E[xt^ ,  xjJ  is  given  by 

E[xuXt,]  =  ^  E  (19) 

where  k  is  some  constant  time  in  the  future. 

The  measure  of  performance  is  the  mean  squared  error  between  the  predicted  value 
and  the  actual  value  at  some  time  in  the  future.  This  error  is  given  by 

error  =  e  =  E[{x'{t2)  -  a:(f2))^]  (20) 


E.2  Source  Code  Listing 

The  following  source  code  listing  is  a  C  emplimentation  of  the  statistical  prediction 
algorithm  previously  outlined.  It  requires  two  serarate  hies  to  be  present  in  the  same 
directory:  "sp-defs.h",  a  declarations  hie  for  the  main  program,  and  "param_sp.dat",  a 
parameter  hie  for  declaring  variable  arrays.  Trie  functions  used  from  "NRUTIL.C"  are 
listed  in  Appendix  B. 

/*  SP.C  ♦*♦♦*♦♦♦♦*♦*♦*♦♦♦*♦♦***♦*♦*♦♦**♦♦*♦♦**♦*♦+*♦♦*♦*=('+♦♦♦*** 

Statistical  Prediction  Software.  This  program  performs 
the  best  linear  prediction  for  any  ergotic  function.  The 
default  input  datafile  name  is  "data.drt"  but  you  can  use 
any  filename  desired  as  long  as  it  is  passed  to  SP  at  the 
command  line.  The  results  printed  to  the  default 
display  are  self  explanitory. 

Required  input  files:  sp.defs.h,  and  params^p.dat 
Files  created:  stat-results.dat,  staLdes.dat 
staLoutdat,  and  staLerroi.dat 

date:  17  Oct  91 

written  by:  Randall  L.  Lindsey,  GEO-9  ID 

#includc  <stdio.h> 

#include  "macros. h" 

#include  <math.h> 


91 


#include  "sp_defs.h" 

#include  <string.h> 

void  main(int  argc,  char  ’(‘argv[]) 

switch  (aigc)  { 
case  1: 

datafile="  data .  dat " ; 
check_file(); 
inidalizeO; 
read_dataO; 

MEANO; 

MEAN^QO; 

VARO; 

COVO; 

compute-outputO; 

compute_error(); 

p!'int_results(); 

break; 

case  2: 

dataiile=argv[l]; 

check_file(); 

initializeO; 

read^dataO; 

MEANO; 

MEANJIQO; 

VARO; 

COVO; 

compute-outputO; 

compute_errorO; 

print-resultsO; 

break; 

case  3: 
default: 

printf("\nUsage:  sp  [datafilename.dat]  \n\n"); 
break; 

} 

}/*  EndMAINQofSP.C  =/ 

void  initializeO 

{ 

/*  Read  data  from  the  input  file  "panun^p.dat"  V 
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printf(''%s",''Init .  . . "); 
ifp=fopen( "  par am_sp .  da t " ,  "  r " ); 

fscanf(ifp,"%d  %f  %d  %d",&epochs,&alpha,&seed,&look_ahead); 
fdose(ifp); 

!*  Read  data  from  the  input  file  datafile  (user  specified)  V 
ifp=fopen(datafile,  "  r " ); 

fscanf(ifp, "  %d  %d  %d "  ,&nuiii Jnputs,&num-outputs,&nuin-nodes); 

fscanf(ifp, "  %d "  ,&num_vectors); 

fclose(ifp); 

m  =  num  Jnputs  +  1 ;  h#  of  external  inputs  4 

nrows  =  n  =  num  Jiodes;  /*  #  of  rows  for  weight  matrix  4 

ncols  =  m  +  num  Jiodes;  h  #  of  cols  for  weight  matrix  4 

/*  Allocate  memory  for  vectors  and  matrices  4 

e=vector(0,nrows— 1);  h  error  vector  4 

z=vector(0,num_vectors+look_ahead);  /*  input  vector  array  4 
y=vector(0,num_vectors);  h  output  vector  array  4 
d=vector(0,num_vectors+look_ahead);  /*  desired  output  array  4 

/*  Initialize  variables  to  zero  4 

J[0]=J[1]=0.0; 

Ioopi(num_vectors) 
e[i]  =  y[i]=d[i]  =  z[i]  =  0.; 


return; 

} 


/♦♦♦♦♦★>(!********J(<*********!tt********!(e+!tct***^****%***!(i+*****!(e**)(c*  +  *V 

void  read-data() 

{ 

ifp=fopen(datafiIe,  "r"); 
fskip.line(ifp); 

loopi(num_vectors+Iook.ahead) 
fscanf(ifp,"%f  %f ",&z[i],&d[i]); 
fclose(ifp); 
return; 

} 


/:ti*t*********************************************************r***4 
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void  MEANO 

{ 

float  X=0.; 
loopi(num_vectors) 

X+=z[i]; 

mean  =  1.0/(float)num_vectors  *  X; 
printf("%s  =  %f\n","  Mean", mean); 
return ; 


} 


/:)c :(( 3tc  :(ci(c  i(c  ilct  4=  *  9):  *  *  :<=  *  >^  If  4=  t  *  I):  it  *  *  *  4=  ^  it:  itc  If:  ^  ^  1 1 ^  ifc  it  %  I):  *  *  >1=  lit  If:  ift  4:  V 

void  MEAN-SQO 

{ 

float  X=0.; 
loopi(num_vectors) 

X+=z[i]+z[i]; 

mean-sq  =  1.0/(float)num_vectors  ♦  X; 
printf("%s  =  %f\n","Mean  Sq",mean_sq); 
return; 

} 


/♦♦♦♦♦♦♦♦it*i!**=f=************i!*****=t!**i:***i==ft**=f'**=ft**=f'*it******+*i==ft5|t!|tV 

void  VAR() 

{ 

var  =  mean^  -  mean^mean; 
printf("%s  =  %f  \n","Var",var); 
return; 

} 


/♦♦♦♦♦♦=f'******i'*it:****if=**it*i'*!f'****i!*!f=**=f'=ftjfti|t**=ft*it++*it=ft**=ft****it*»*it+V 

void  COV() 

{ 

float  X=0.; 
ioopi(num-vectors) 

X  +=  z[i]4tz[i+look.ahead]; 
cov  =  (1.0/(float)num-vectors  ♦  X)  —  (mean*mean); 
printf("%s  =  %f\n","Cov",cov); 
return ; 

} 


void  compute.outputO 

{ 
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} 


printf( "  %  s  \n " , "  Output " ); 
loopKnum-vectors) 
y{i]  =  mean+(cov/var)*(z[i]-inean); 
return; 


/iHilf^:*!tfit**********************************Hiif****************t*****^ 

void  compute_error() 

{ 

float  X=0.; 
loopi(num_vectois){ 
e[i]  =  z[i+look_ahead]  -  y[i]; 

X  +=  e[i]  ♦  e[i]; 

} 

error  =  1.0/(float)num_vectors  *  X; 
printf(''%s  =  %f\n",  "Error  ".error); 
return ; 

} 


void  print  jesuItsQ 

{ 

printf("%s"," Printing  results. . ."); 
ofp  =  fopen(" stat_result s . dat ",  "w"); 
loopi(num-vectors) 

fprintf(ofp,"%  f  %  f  %  f\n",y[i],d[i],e[i]); 
fclose(ofp); 

ofp  =  fopen("stat_out  .dat ",  "w"); 
loopi(num_vectors) 
fprintf(ofp, "  %  f  \  n "  ,y [i]); 
fclose(ofp); 

ofp  =  fopen(" stat_des.dat",  "w"); 
loopi(nuni-vectors) 
fprintf(ol^,"%  f\n",d[i]); 
fclose(ofp); 

ofp  =  fopen(" stat_error.dat",  "w"); 
loopi(num.vectors) 
fprintf(ofp,"%  f\n",e[i]); 
fclose(ofp); 

printf("%s\n","Done. "); 

} 


void  check.file()  /♦  Written  10  Jul  91,  RLL.  ^ 
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{ 

FILE  *afp; 

afp  =  fopen(datafile, ''  r " ); 
if(afp  =  NULL){ 

/*strcpy(afile,  "Pile  not  found");V 

printf("\n%s  %s \n", datafile, " :  File  not  found."); 
exit(O); 

} 

else  fclose(afp); 
return; 

} 


/*  SPJDEFS.H  **************************************************** 

File  contidning  function  declarati<ms  and  variable 
declarations  for  the  main  pro^am  called  sp.c. 

date:  17  Oct  91 

written  by:  Randall  L.  Lindsey 

***^:if*:^^:^:*t^ft****************************************************l/ 

float  ♦vector(); 

FILE  ♦ifp,  ♦ofp,  ♦ifpl,  *011)1; 
intrun=l; 

char  str[80],  *datafile; 

int  nrows,  ncols,  i,  j,  k,  I,  m,  n; 

int  epochs,  a,  b,  t,  look-ahead; 

int  numJnputs,  nuin.outputs,  numjiodes,  num-vectors,  seed; 

float  alpha,  alphal,  J[2],  sum,  mean,  mean-sq,  var,  cov,  error; 

float  *e,  ♦z,  *y,  *d; 

void  MEANO; 

void  MEAN-SCK); 

void  VAR(); 

void  COV(); 

void  initialize(); 

void  read-dataO; 

void  computejerror(); 

void  printjresultsO; 

void  check-fileO; 

void  compute.outputO; 

/*it:*:^t***********************************************************i/ 
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/**  PARAM-SP.DAT 
200  3.0  153 

/:^  :(c !(:  :(c  ifc  i(c  i|c)|t  i|c  !f:  )|c  !(c  s((  i):  i|ci(c  :tc  ^  s)ci|c  i(t  )(c  :4c)|c :{( :(e  :tci(c  ’ll ’ll ’!<’((  it:  ^ i4c  ^  i|<  i):  ^  >|c  >(<  =4^  >l< ’ll  >):  It: ^  i("l/ 
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